HUPAN toolkit integrates more than 10 software, most of these software can be provided to the main program when the specific tool is selected. It is convenient for users to work on supercomputers or clusters. To run the main program of HUPAN toolkit, you only need several software and packages.
R is utilized for visulization and statistical tests in HUPAN toolkit. Please install R first and make sure R and Rscript are under your PATH.
Download R here.
Several R packages are needed including ggplot2, reshape2 and ape packages. Follow the Installation step 3 or you can install the packages by yourself.
Download the HUPAN toolkit here.
Uncompress the HUPAN toolkit package.
tar zxvf HUPAN-vXX.XX.tar.gz
Install necessary R packages.
cd HUPAN-vXX.XX Rscript installRPac
Compile the source codes.
make
You will find executable files:ccov,bam2cov and hupan et al. in bin/ directory.
Add bin/ to PATH and add lib/ to LD_LIBRARY_PATH.
To do this, edit your path in the following text and add the text to the end of the file ~/.bash_profile
export PATH=$PATH:/path/to/HUPAN/bin: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/HUPAN/lib/: export PERL5LIB=$PERL5LIB:/path/to/HUPAN/lib/: source /path/to/HUPAN/hupan_cmd.sh
and run
source ~/.bash_profile
Test if HUPAN toolkit is installed successfully
hupan
If you could see the following content, congratulations! HUPAN toolkit is successfully installed. If not, see if all the requirements are satisfied; or you may contact the authors for help.
Usage: hupan <command> ... Available commands: qualSta View the overall sequencing quality of a large number of files trim Trim or filter low-quality reads parallelly alignRead Map reads to a reference parallelly sam2bam Covert alignments (.sam) to sorted .bam files bamSta Statistics of parallel mapping assemble Assemble reads parallelly alignContig Align assembly results to a referenece parallelly extractSeq Extract contigs parallelly assemSta Statistics of parallel assembly getUnalnCtg Extract the unaligned contigs from nucmer alignment (processed by quast) rmRedundant Remove redundant contigs of a fasta file pTpG Get the longest transcripts to represent genes geneCov Calculate gene body coverage and CDS coverage geneExist Determine gene presence-absence based on gene body coverage and CDS coverage subSample Select subset of samples from gene PAV profile gFamExist Determine gene family presence-absence based on gene presence-absence bam2bed Calculate genome region presence-absence from .bam fastaSta Calculate statistics of fasta file sim Simulation and plot of the pan-genome and the core genome getTaxClass Obtain the taxonomic classification of sequences rmCtm Detect and discard the potentail contamination blastAlign Align sequences to target sequence by blast simSeq Simulate and plot the total size of novel sequences splitSeq Split sequence file into multiple small size files genePre Ab initio gene predict combining with RNA and protein evidence mergeNovGene Merge maker result from multiple maker result files filterNovGene Filter the novel precited genes.