重新写了一个由reads_counts转FPKM矩阵的脚本,之前的那一般只适用于18个样本的,这里更新了一下,没有样本限制了。
还是分为3步:
grep "exon" genome.gtf > genome_exon.gtf
python count_genelen_from_gft.py genome_exon.gtf gene.len
python Caculate_FPKM.py mapped_gene_number.txt gene.len raw_counts.matrix FPKM.matrix
第一步:把gtf文件中的exon抓取出来
第二步:计算每条基因的exon长度和
第三步:基于每个样本匹配到的reads总数、每个基因的长度、read_counts矩阵得到FPKM矩阵
第一步:
直接运行上面脚本的第一行就行了,记得更改输入的gtf的文件的文件名为你的文件名。
gtf范例
Morimp01GS001 AUGUSTUS start_codon 47156 47158 . - 0 gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS exon 47073 47158 . - . gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS CDS 47073 47158 0.6 - 0 gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS exon 46887 46999 . - . gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS CDS 46887 46999 0.92 - 1 gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS exon 46644 46784 . - . gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS CDS 46644 46784 0.91 - 2 gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS exon 45970 46559 . - . gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS CDS 45970 46559 0.92 - 2 gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS stop_codon 45970 45972 . - 0 gene_id "MIM04M26Gene00002"; transcript_id "MIM04M26Gene00002.t1"; gene_name "MIM04M26Gene00002";
Morimp01GS001 AUGUSTUS start_codon 54027 54029 . + 0 gene_id "MIM04M26Gene00003"; transcript_id "MIM04M26Gene00003.t1"; gene_name "MIM04M26Gene00003";
Morimp01GS001 AUGUSTUS exon 54027 54291 . + . gene_id "MIM04M26Gene00003"; transcript_id "MIM04M26Gene00003.t1"; gene_name "MIM04M26Gene00003";
Morimp01GS001 AUGUSTUS CDS 54027 54291 0.59 + 0 gene_id "MIM04M26Gene00003"; transcript_id "MIM04M26Gene00003.t1"; gene_name "MIM04M26Gene00003";
Morimp01GS001 AUGUSTUS exon 54361 54689 . + . gene_id "MIM04M26Gene00003";