Requirements

HUPAN toolkit integrates more than 10 software, most of these software can be provided to the main program when the specific tool is selected. It is convenient for users to work on supercomputers or clusters. To run the main program of HUPAN toolkit, you only need several software and packages.

  1. R 3.1 or later
  2. R is utilized for visulization and statistical tests in HUPAN toolkit. Please install R first and make sure R and Rscript are under your PATH.

    Download R here.

  3. R packages
  4. Several R packages are needed including ggplot2, reshape2 and ape packages. Follow the Installation step 3 or you can install the packages by yourself.

  5. Perl
  6. g++
  7. Python 2.5, 2.6 or 2.7 (required by QUAST)

Installation procedures

  1. Download the HUPAN toolkit here.

  2. Uncompress the HUPAN toolkit package.

    tar zxvf HUPAN-vXX.XX.tar.gz
  3. Install necessary R packages.

    cd HUPAN-vXX.XX
    Rscript installRPac
  4. Compile the source codes.

    make

    You will find executable files:ccov,bam2cov and hupan et al. in bin/ directory.

  5. Add bin/ to PATH and add lib/ to LD_LIBRARY_PATH.

    To do this, edit your path in the following text and add the text to the end of the file ~/.bash_profile

    export PATH=$PATH:/path/to/HUPAN/bin:
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/HUPAN/lib/:
    export PERL5LIB=$PERL5LIB:/path/to/HUPAN/lib/:
    source /path/to/HUPAN/hupan_cmd.sh

    and run

    source ~/.bash_profile
  6. Test if HUPAN toolkit is installed successfully

    hupan

    If you could see the following content, congratulations! HUPAN toolkit is successfully installed. If not, see if all the requirements are satisfied; or you may contact the authors for help.

    Usage: hupan <command> ...
    
    Available commands:
            qualSta         View the overall sequencing quality of a large number of files
            trim            Trim or filter low-quality reads parallelly
            alignRead       Map reads to a reference parallelly
            sam2bam         Covert alignments (.sam) to sorted .bam files
            bamSta          Statistics of parallel mapping
            assemble        Assemble reads parallelly
            alignContig     Align assembly results to a referenece parallelly
            extractSeq      Extract contigs parallelly
            assemSta        Statistics of parallel assembly
            getUnalnCtg     Extract the unaligned contigs from nucmer alignment (processed by quast)
            rmRedundant     Remove redundant contigs of a fasta file
            pTpG            Get the longest transcripts to represent genes
            geneCov         Calculate gene body coverage and CDS coverage
            geneExist       Determine gene presence-absence based on gene body coverage and CDS coverage
            subSample       Select subset of samples from gene PAV profile
            gFamExist       Determine gene family presence-absence based on gene presence-absence
            bam2bed         Calculate genome region presence-absence from .bam
            fastaSta        Calculate statistics of fasta file
            sim             Simulation and plot of the pan-genome and the core genome
            getTaxClass     Obtain the taxonomic classification of sequences
            rmCtm           Detect and discard the potentail contamination
            blastAlign      Align sequences to target sequence by blast
            simSeq          Simulate and plot the total size of novel sequences
            splitSeq        Split sequence file into multiple small size files
            genePre         Ab initio gene predict combining with RNA and protein evidence
            mergeNovGene    Merge maker result from multiple maker result files
            filterNovGene   Filter the novel precited genes.