The human reference genome is still incomplete, especially for those population-specific or individual-specific regions, which may have important functions. It encourages us to bulit the pan-genome of a human population. Previously, we developed a "map-to-pan" strategy - EUPAN, specific for eukaryotic pan-genome analysis. However, due to the large genome size of individual human genome, EUPAN is not suit for pan-genome analysis involving in hundreds of individual genomes. Here, we present a improved tool, HUPAN (HUman Pan-genome ANalysis), for human pan-genome analysis.
We propose HUPAN strategy primarily in the 185 deep sequencing and 90 assembled Han Chinese genomes. HUPAN uiltized all the well-conceived strategies of EUPAN. Besides, HUPAN has a number of distinct improvements as follows:
Human genome studies always involve big data and various softwares; and require very careful parameter selection process. Therfore HUPAN toolbox provides 3 types of tools: 1) single machine version, 2) LSF version (working on supercomputer based on LSF system, in which, "bsub" is used to submit jobs) and 3) SLURM version (working on supercomputer based on SLURM system, in which, "sbatch" is used to submit jobs).
HUPAN is free for non-commercial use (CC BY-NC 4.0). For commercial use, please contact the authors.
Zhongqu Duan, Yuyang Qiao, Jinyuan Lu, Huimin Lu, Wenmin Zhang, Fazhe Yan, Chen Sun, Zhiqiang Hu, Zhen Zhang, Guichao Li, Hongzhuan Chen, Zhen Xiang, Zhenggang Zhu, Hongyu Zhao, Yingyan Yu, Chaochun Wei, HUPAN: a pan-genome analysis pipeline for human genomes, Genome Biology, 2019, 20:149