PaSS is a fast sequencing simulator for PacBio sequencing with a high fidelity. It will facilitate the evaluation and development of new analysis tools for the PacBio sequencing data.
Linux operation system, memory 1G or up; Perl and gcc is needed.
Download the tarball here.
tar xzvf PaSS.tar.gz
gcc -lm -lpthread PaSS.c -o PaSS
perl pacbio_mkindex.pl E.coli/ecoli_ref.fa ./
After this step, you can get two files percentage.txt and index (containing some information about the taget genome) in the current directory and they will be used in the following simulation stage.
./PaSS -list percentage.txt -index index -m pacbio_RS -c sim.config -r 1000 -t 4 -o out
Parameters: -list percentage.txt -index index file -m pacbio_RS or pacbio_sequel, the sequencer that can choose -c the profile that generated in the error model stage. sim.config is the profile of the example dataset. There are three profiles prepared for E.coli,C.elegan and A.thaliana respectively. -r number of reads to simulate -t number of threads to use, default is 1. -o name of output file -d If '-d' is set, the ground truth of simulation will output concurrently.
blasr real.fastq reference.fasta --allowAdjacentIndels --hitPolicy randombest --out real.blasr -m 0
perl run.pl example/example.fq example/example.blasr RS/sequel
parameter1: real PacBio sequencing data.
parameter2:alignment results of real data.
parameter3: the version of sequencer, RS or sequel. If the sequencer is RS, the distribution of quality-value is included in the model.
The ouput is sim.config.
Real PacBio sequencing datasets and their alignment results can be downloaded here.
Wenmin Zhang: Melody091835@163.com
Chaochun Wei: ccwei@sjtu.edu.cn