大家可以看最新版https://blog.youkuaiyun.com/qq_26012913/article/details/111939262?spm=1001.2014.3001.5501
首先我们要把gtf文件中的exon抓取出来
grep "exon" genome.gtf > genome_exon.gtf
然后提取genome_exon.gtf文件中的gene的exon的长度和得到我们想要的gene的长度
python count_genelen_from_gft.py genome_exon.gtf gene.len
这其中count_genelen_from_gft.py的代码如下:
import sys,re
file1 = sys.argv[1]
file2 = sys.argv[2]
f1 = open(file1,'r')
f2 = open(file2,'w')
flag = "fuck"
exon = []
for i in f1:
a = i.split("\"")
if flag == a[-2]:
pos = i.split("\t")
exon.append(abs(int(pos[4])-int(pos[3]))+1)
elif flag == "fuck":
flag = a[-2]
pos = i