PPanG: a precise pangenome browser combining linear and graph-based pan-genomes

1. Introduction

As a single reference genome cannot represent the whole genomic diversity, pangenome is gradually accepted to cover the genetic information in all individuals for a species. Currently, graph-based pangenome is gradually gaining popularity than linear pangenome because the graph model stores more comprehensive information of variations, including the locations and structures. However, the graph-based pangenome browser is only designed for multiple sequence alignment, and it is unavailable to visualize multiple genome annotations simultaneously.

Here we report a new pangenome browser called PPanG, precise pangenome browser combining linear and graph-based pangenome. We used rice pangenome as an example to show it.

The PPanG visualization is https://cgm.sjtu.edu.cn/PPanG/visualization/.

The PPanG homepage is https://cgm.sjtu.edu.cn/PPanG/.

2. User guide

PPanG is implemented by SequenceTubeMap and JBrowse2, and we modify the functions and add new many features for visualize precise pangenome annotation. Nine rice genomes with high quality sequences and annotations are provided by default as the potential reference genomes, and all individual genomes can be selected as the reference.

PPanG is also designed to be user-friendly and easy to use for other regions of interest. Only the following two steps are needed:

1. Select the target chromosome at “Data”:

2. Provide the target region in “Region” and click the “Go” button.

There are three ways to specify the target region:

2a. Simply select the gene from “Built-in Gene” if anyone is the gene of interest: 2b. Input the MSU7 gene ID into “Region” (e.g. “LOC_Os01g66200”) if the gene of interest is annotated by MSU7, and the browser will turn to the target region of this gene automatically: 2c. Select the path and input the coordinate range into “Region” (e.g. “IRGSP-1.0.chr01:1-100”) or a start position and a distance (e.g. “IRGSP-1.0.chr01:1+100”) to specify any custom region. Any sample can be considered as the reference in this step.

Note: Occasionally, if a gene overlaps with an adjacent gene in this region, elements of both genes will be drawn in gray and cannot be distinguished. In this case, it is needed to select the genes of no interest to hide them:

3. Run PPanG for your own data