HUPAN toolkit

Download the latest version of HUPAN toolkit here.

Tools required by HUPAN toolkit

HUPAN toolkit integrates many software. An improper version may cause errors. Therefore we provide these tools of the recommended version in a package.

Download these tools here (281M).

Example data

Download the ExampleData (2.7G), and its md5.

Simulation data

The simulation data set is used to compare two assembly methods by all reads or unmapped reads, and also used to optimize the parameter settings in sga.

Download simdata_R1.fq.gz and simdata_R2.fq.gz.Then check them use md5sum: simdata_R1.fq.gz.md5, simdata_R1.fq.gz.md5.

Whole genome sequencing of 185 Han Chinese individuals

The raw sequencing data and assemblied contigs are available here.

Please apply the data according to this document Readme; and send a signed DATA USE AGREEMENT to access the data sets.

The non-reference sequences derived from 185 newly sequenced genomes can be download from here and md5.

90 Han Chinese genomes

The assmeblied genomes of 90Han Chinese individuals were downloaded from Deep whole-genome sequencing of 90 Han Chinese genomes.

The non-reference sequences drived from 90 Han Chinese assemblied genomes are released as 90genomes_novel_sequences.tar.gz and md5.

Han Chinese pan-genome sequences

The pan-genome sequences were available HanChinesePan.fa.gz, HanChinesePan.fa.gz.md5.

Novel predicted genes

The novel predicted genes from 185 newly sequenced Han Chinese genomes:

    Download sequences in FASTA fromat for sequences: genome, transcript, protein

    Download genome annotation in: GFF format

The novel predicted genes from 90 assembled Han Chinese genomes:

    Download sequences in FASTA fromat for sequences: genome, transcript, protein

    Download genome annotation in: GFF format

The 188 non-redundant genes from 275 Han Chinese genomes missing in GRCh38 primary assembly, patch sequences and alternative loci:

    Download sequences in FASTA fromat for sequences: genome, transcript, protein

    Download genome annotation in: GFF format