PaSS: Sequencing simultor for PacBio Sequencing

PaSS is a fast sequencing simulator for PacBio sequencing with a high fidelity. It will facilitate the evaluation and development of new analysis tools for the PacBio sequencing data.

System requirements

Linux operation system, memory 1G or up; Perl and gcc is needed.

Installation

Download the tarball here.
Uncompress the PaSS.tar.gz

tar xzvf PaSS.tar.gz

Compile the source codes.

gcc PaSS.c -o PaSS -lm -lpthread

Simulate PacBio multi-pass sequencing reads

Generate the index file of the target genome

perl pacbio_mkindex.pl E.coli/ecoli_ref.fa ./

After this step, you can get two files percentage.txt and index (containing some information about the taget genome) in the current directory and they will be used in the following simulation stage.

Simulation

./PaSS -list percentage.txt -index index -m pacbio_RS -c sim.config -r 1000 -t 4 -o out

Parameters: 
-list      percentage.txt
-index     index file
-m         pacbio_RS or pacbio_sequel, the sequencer that can choose
-c         the profile that generated in the error model stage. 
           sim.config is the profile of the example dataset.
           There are three profiles prepared for E.coli,C.elegan and A.thaliana respectively.
-r         number of reads to simulate
-t         number of threads to use, default is 1.
-o         name of output file
-d         If '-d' is set, the ground truth of simulation will output concurrently.

Estimate the error model from the real PacBio sequencing data

Align the sequencing reads to reference genome by blasr.

blasr real.fastq reference.fasta --allowAdjacentIndels --hitPolicy randombest --out real.blasr -m 0

Generating profiles by "run.pl".
```
perl run.pl example/example.fq example/example.blasr RS/sequel
```
parameter1: real PacBio sequencing data.

parameter2:alignment results of real data.

parameter3: the version of sequencer, RS or sequel. If the sequencer is RS, the distribution of quality-value is included in the model.

The ouput is sim.config.

Datasets

Real PacBio sequencing datasets and their alignment results can be downloaded here.

Use & Citation

Please cite the following paper if you use PaSS (can be considered as NeSSM 2.0).

Zhang W., Jia B. and Wei C., "Pass:a sequencing simulator for PacBio sequencing", BMC Bioinformatics 2019, 20:352

Contact Information

Wenmin Zhang: Melody091835@163.com
Chaochun Wei: ccwei@sjtu.edu.cn