As a single reference genome cannot represent the whole genomic diversity, pangenome is gradually accepted to cover the genetic information in all individuals for a species. Currently, graph-based pangenome is gradually gaining popularity than linear pangenome because the graph model stores more comprehensive information of variations, including the locations and structures. However, the graph-based pangenome browser is only designed for multiple sequence alignment, and it is unavailable to visualize multiple genome annotations simultaneously.
Here we report a new pangenome browser called PPanG, precise pangenome browser combining linear and graph-based pangenome. We used rice pangenome as an example to show it.
The PPanG visualization is https://cgm.sjtu.edu.cn/PPanG/visualization/.
The PPanG homepage is https://cgm.sjtu.edu.cn/PPanG/.
PPanG is implemented by SequenceTubeMap and JBrowse2, and we modify the functions and add new many features for visualize precise pangenome annotation. Nine rice genomes with high quality sequences and annotations are provided by default as the potential reference genomes, and all individual genomes can be selected as the reference.
PPanG is also designed to be user-friendly and easy to use for other regions of interest. Only the following two steps are needed:
There are three ways to specify the target region:
2a. Simply select the gene from “Built-in Gene” if anyone is the gene of interest:
2b. Input the MSU7 gene ID into “Region” (e.g. “LOC_Os01g66200”) if the gene of interest is annotated by MSU7, and
the browser will turn to the target region of this gene automatically:
2c. Select the path and input the coordinate range into “Region” (e.g. “IRGSP-1.0.chr01:1-100”) or a start position
and a distance (e.g. “IRGSP-1.0.chr01:1+100”) to specify any custom region. Any sample can be considered as the
reference in this step.
Note: Occasionally, if a gene overlaps with an adjacent gene in this region, elements of both genes will be drawn in gray and cannot be distinguished. In this case, it is needed to select the genes of no interest to hide them:
git clone git@github.com:SJTU-CGM/PPanG.git
cd PPanG/
npm install # or yarn install
tabix is also needed in your PATH
, it can be installed by sudo apt install tabix
or source code.
The configuration of SequenceTubeMap view is in src/config.json
, dataPath
should be set to your own data folder (
in PPanG dataPath
is riceData/
) and DATA_SOURCES
correspond to the xg files in your dataPath
. The reference is
set in reference
(alias
and annotation
just remain the same to name
if no alias). bedFile
is only available
if vg chunks are pre-processed, otherwise it should be removed. Other detailed configuration is available in
SequenceTubeMap.
All genomes and GFF3 annotations are needed in bgzip format (*.fasta.gz, *.fasta.gz.gzi, *.fasta.gz.fai, *.gff.gz, *
.gff.gz.tbi) in jbrowse/
folder.