注:这些工具的应用都是受限的,有些本来就是只能用于预测动物,在使用之前务必用ground truth数据来测试一些。我想预测某一个植物的转录本,所以可以拿已经注释得比较好的拟南芥来测试一下。(测试的结果还是比较惊人的)
CPC
(熟悉的名字,原来是北京大学的高歌、魏丽萍开发的)
搜文章时才发现2017年已经出了CPC2了
CPC可在线使用
a Support Vector Machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features.
Coding Potential Calculator distinguish protein-coding from non-coding RNAs based on the sequence features of the input transcripts. Our preliminary performance assessment suggests the CPC can reliably discriminate the coding and non-coding transcripts in ~98% accuracy. We provide an online version of CPC here.
自称有98%的准确率
bin/run_predict.sh (input_seq) (result_in_table) (working_dir) (result_evidence)
CPC RESULTS (The first column is input sequence ID; the second column is input sequence length; the third column is coding status and the four column is the coding potential score (the "distance" to the SVM classification hyper-plane in the features space).)
AF282387 528 coding 3.32462 Tsix_mus 4300 noncoding -1.30047
HOMO EVIDENCE
ORF EVIDENCE
AF282387 ORF_FRAMEFINDER 4 529 99.43 109.41 Full Tsix_mus ORF_FRAMEFINDER 4077 4206 3.00 27.50 Full
FRAME FINDER