Pangenome analysis reveals structural variation associated with seed size and weight traits in peanut

Introduction

Peanut (Arachis hypogaea L.) is a significant oilseed and food legume crop, with seed size and weight being a critical trait for domestication and breeding. Here are codes and result data of the article (mirror URL).

Codes

The main pipelines and self-writen scripts have been uploaded to github.

Link: https://github.com/SJTU-CGM/PeanutPan

Result data

Genome sequences of newly sequenced peanut accessions

Sample Genome sequence Gene annotation
Adu Adu.fa.gz Adu.gff3.gz
Amon Amon.fa.gz Amon.gff3.gz
H16-5 H16-5.fa.gz H16-5.gff3.gz
mH8 mH8.fa.gz mH8.gff3.gz
NDH108 NDH108.fa.gz NDH108.gff3.gz
ZP06 ZP06.fa.gz ZP06.gff3.gz
  • Genome sequence: The long (ONT Ultra-long / PacBio HIFI) reads were corrected and assembled using NextDenovo. First, contigs were polished using Racon and NextPolish with long reads. Next, the contigs were further clustered, ordered, and oriented scaffolds onto chromosomes by LACHESIS.
  • Gene annotation: The gene prediction was performed using GeMoMa, PASA, Augustus and EVidenceModeler, combing de novo, transcript and homolog protein based strategy.

Pangenome construction of peanut accessions

Dataset File type Description File Link
SVAss Raw VCF Variants from SVAss, merging SVs from 8 peanuts using SURVIVOR P8.SVAss.SURVIVOR.vcf.gz
SVAssRead Raw VCF Variants from SVAssRead, merging SVs from 8 peanuts using SURVIVOR P8.SVAssRead.SURVIVOR.vcf.gz
MC Raw VCF Variants from MC[Minigraph-Cactus], directly constructed from 8 peanuts, including small variants P8.MC.raw.vcf.gz
MC Raw GFA Variants graph from MC[Minigraph-Cactus], directly constructed from 8 peanuts, including small variants P8.MC.raw.gfa.gz
SVAss Processed VCF Pangenie-Ready variants from SVAss, using Pangenie preparing pipeline, annotated with VCFanno P8.SVAss.PangenieReady.VCFanno.vcf.gz
SVAssRead Processed VCF Pangenie-Ready variants from SVAssRead, using Pangenie preparing pipeline, annotated with VCFanno P8.SVAssRead.PangenieReady.VCFanno.vcf.gz
MC Processed VCF Pangenie-Ready variants from MC, using Pangenie preparing pipeline, annotated with VCFanno P8.MC.PangenieReady.VCFanno.vcf.gz

Variants and genotyes of near 269 resequencing peanut accessions

Variant type Sub-genome Filtered variant
SNP A AA_merge.snp.MM05MAF005.vcf.gz
SNP B BB_merge.snp.MM05MAF005.vcf.gz
SNP A&B (tetraploid only) AABB_merge.snp.MM05MAF005.vcf.gz
SV A&B (tetraploid only) AABB230.SVAssRead.evg.c3.force0.vcf.gz
Indel&SV(variant length>=10) A&B (tetraploid only) AABB230.MCmt10bp.evg.c3.force0.vcf.gz

Contact Information

Hongzhang Xue: xuehzh95@sjtu.edu.cn
Chaochun Wei: ccwei@sjtu.edu.cn
Dongmei Yin: yindm@henau.edu.cn


Copyright © 2025
The laboratory of computational genomics and metagenomics in Shanghai Jiao Tong University &
HAU Peanut Team, College of Agronomy, Henan Agricultural University.
All Rights Reserved.