Download the latest version of HUPAN toolkit here.
HUPAN toolkit integrates many software. An improper version may cause errors. Therefore we provide these tools of the recommended version in a package.
Download these tools here (281M).
Download the ExampleData (2.7G), and its md5.
The simulation data set is used to compare two assembly methods by all reads or unmapped reads, and also used to optimize the parameter settings in sga.
Download simdata_R1.fq.gz and simdata_R2.fq.gz.Then check them use md5sum: simdata_R1.fq.gz.md5, simdata_R1.fq.gz.md5.
The raw sequencing data and assemblied contigs are available here.
Please apply the data according to this document Readme; and send a signed DATA USE AGREEMENT to access the data sets.
The non-reference sequences derived from 185 newly sequenced genomes can be download from here and md5.
The assmeblied genomes of 90Han Chinese individuals were downloaded from Deep whole-genome sequencing of 90 Han Chinese genomes.
The non-reference sequences drived from 90 Han Chinese assemblied genomes are released as 90genomes_novel_sequences.tar.gz and md5.
The pan-genome sequences were available HanChinesePan.fa.gz, HanChinesePan.fa.gz.md5.
The novel predicted genes from 185 newly sequenced Han Chinese genomes:
    Download sequences in FASTA fromat for sequences: genome, transcript, protein
    Download genome annotation in: GFF format
The novel predicted genes from 90 assembled Han Chinese genomes:
    Download sequences in FASTA fromat for sequences: genome, transcript, protein
    Download genome annotation in: GFF format
The 188 non-redundant genes from 275 Han Chinese genomes missing in GRCh38 primary assembly, patch sequences and alternative loci:
    Download sequences in FASTA fromat for sequences: genome, transcript, protein
    Download genome annotation in: GFF format