Installation

Requirements

HUPAN toolkit integrates more than 10 software, most of these software can be provided to the main program when the specific tool is selected. It is convenient for users to work on supercomputers or clusters. To run the main program of HUPAN toolkit, you only need several software and packages.

R 3.1 or later

R is utilized for visulization and statistical tests in HUPAN toolkit. Please install R first and make sure R and Rscript are under your PATH.

Download R here.

R packages

Several R packages are needed including ggplot2, reshape2 and ape packages. Follow the Installation step 3 or you can install the packages by yourself.

Perl
g++
Python 2.5, 2.6 or 2.7 (required by QUAST)

Installation procedures

Download the HUPAN toolkit here.
Uncompress the HUPAN toolkit package.
```
tar zxvf HUPAN-vXX.XX.tar.gz
```
Install necessary R packages.
```
cd HUPAN-vXX.XX
Rscript installRPac
```
Compile the source codes.
```
make
```
You will find executable files:ccov,bam2cov and hupan et al. in bin/ directory.

Add bin/ to PATH and add lib/ to LD_LIBRARY_PATH.

To do this, edit your path in the following text and add the text to the end of the file ~/.bash_profile

export PATH=$PATH:/path/to/HUPAN/bin:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/HUPAN/lib/:
export PERL5LIB=$PERL5LIB:/path/to/HUPAN/lib/:
source /path/to/HUPAN/hupan_cmd.sh

and run

source ~/.bash_profile

Test if HUPAN toolkit is installed successfully

hupan

If you could see the following content, congratulations! HUPAN toolkit is successfully installed. If not, see if all the requirements are satisfied; or you may contact the authors for help.

Usage: hupan <command> ...

Available commands:
        qualSta         View the overall sequencing quality of a large number of files
        trim            Trim or filter low-quality reads parallelly
        alignRead       Map reads to a reference parallelly
        sam2bam         Covert alignments (.sam) to sorted .bam files
        bamSta          Statistics of parallel mapping
        assemble        Assemble reads parallelly
        alignContig     Align assembly results to a referenece parallelly
        extractSeq      Extract contigs parallelly
        assemSta        Statistics of parallel assembly
        getUnalnCtg     Extract the unaligned contigs from nucmer alignment (processed by quast)
        rmRedundant     Remove redundant contigs of a fasta file
        pTpG            Get the longest transcripts to represent genes
        geneCov         Calculate gene body coverage and CDS coverage
        geneExist       Determine gene presence-absence based on gene body coverage and CDS coverage
        subSample       Select subset of samples from gene PAV profile
        gFamExist       Determine gene family presence-absence based on gene presence-absence
        bam2bed         Calculate genome region presence-absence from .bam
        fastaSta        Calculate statistics of fasta file
        sim             Simulation and plot of the pan-genome and the core genome
        getTaxClass     Obtain the taxonomic classification of sequences
        rmCtm           Detect and discard the potentail contamination
        blastAlign      Align sequences to target sequence by blast
        simSeq          Simulate and plot the total size of novel sequences
        splitSeq        Split sequence file into multiple small size files
        genePre         Ab initio gene predict combining with RNA and protein evidence
        mergeNovGene    Merge maker result from multiple maker result files
        filterNovGene   Filter the novel precited genes.