Dr. Chaochun Wei, Department of Bioinformatics and Biostatistics

CTCM: Classifying Transcripts from Coral Metatranscriptome

I. Introduction
    Construction the complete reference transcriptomes for coral and symbiodinium is an important step to analyze the coral community. CTCM classifies the transcripts assembled from coral metatranscriptome for coral and symbiodinium based on alignments. CTCM is evaluated by simulated datasets and real datasets, it can genereate more sensitive and specific classification for coral and symbiodinium than existed methods.

II. System requirements
    CTCM now runs under Linux operation system, it only needs Perl.
    CTCM requires the alignments of transcripts, the alignments are generated by NCBI-Blast. NCBI-Blast can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/.

III. Installation
1. Download:
A tar ball for CTCM system can be downloaded here
2. Decompressing the files:
tar   -zxvf   CTCM.tar.gz

IV. Run the symstem
1.classification:
Command:
perl CTCM.pl --coral <Blastn alignments> --sym <Blastn alignments> --microbe <Megablast alignments> --out <output> [options]*

Arguments:
--coral <Blastn alignments>
the transcripts alignment files with coral database
files are separated by comma
--sym <Blastn alignments>
the transcripts alignment files with symbiodinium database
files are separated by comma
--microbe <Megablast alignments>
the transcripts alignment files with microbe database
files are separated by comma
-out <output>
output file
--score_coral <score threshold>
the threshold of alignment score for classifying coral
the bigger of the threshold, the lower of sensitivity and the higher of specificity
default: 45
--score_sym <score threshold>
the threshold of alignment score for classifying symbiodinium
the bigger of the threshold, the lower of sensitivity and the higher of specificity
default: 51
--help|--h
print help information

Example:
perl CTCM.pl --coral example/coral.blastn.gz --sym example/symbiodinium.blastn.gz --microbe example/microbe.megablast.gz -out classification.txt
In classification.txt file, there are 494 transcripts of coral, 279 transcripts of symbiodinium. The left transcripts may be from microbe.

2.alignments:
    CTCM requires the transcript alignments. The alignments should be generated by users using NCBI-blast. When align transcripts with coral or symbiodinium database, the arguments are "-task blastn -outfmt 7 -max_targets 1", and when align transcripts with microbe database, the arguments are "-outfmt 7 -max_target_seqs 1".

    CTCM system supplies a script for constructing microbe database from Nt database: microbe-db.pl. The usage of this scripts is:
Command:
perl microbe-db.pl --nt <Nt fasta> --out <output> --tax <accession2taxid file> [options]*

Arguments:
--nt <Nt fasta>
the Nt fasta file downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/, it can be in gzipped format
--out <output>
the output file
--tax <accession2taxid file>
the nucl_gb.accession2taxid file downloaded from ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/
--dir <directory of *.dmp>
the directory of names.dmp and nodes.dmp, dmp files are in taxdmp file downloaded from ftp://ftp.ncbi.nih.gov/pub/taxonomy
default: current directory
--help|--h
the usage information

V. Transcripts assembled from simulation datasets
    Six simulated metatranscriptome datasets are created for developping and testing CTCM. The transcripts assembled by trinity from those simulation datasets can be downloaded:
Transcripts from S0 simulation dataset
Transcripts from S10 simulation dataset
Transcripts from S20 simulation dataset
Transcripts from S30 simulation dataset
Transcripts from S40 simulation dataset
Transcripts from S50 simulation dataset
Contact:
If you have any question, please feel free to contact us.
Ben Jia: chenmodexiaoxi@126.com
Chaochun Wei: ccwei@sjtu.edu.cn


Please send your comments or bug reports to Dr. Wei .

 

©2017 Chaochun Wei