对多文本做关联替换

这段代码展示了如何使用Python读取两个文件,一个包含注释信息,另一个是BLAST报告,然后通过匹配ID替换BLAST报告中的ID为注释文件中的完整行。这是一个基本的文本处理任务,适用于简单的数据匹配场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

【问题】

have a file with annotations in the following format:

XS-5236245.2_hypothetical_protein

and a tab delimited blast report with only the accession id in the second column:

transcript1 XS-5236245.2 94.3 35 0 245 356 789 896 1e-230 6.3

I want to replace the accession_id from the blast report with the whole line from the annotations file when there is a match. This is my attempt and as you can see I use very basic python. If you give me a more complex solution I would appreciate some explanation. Thank you for your help.

Linu

#!/usr/bin/python``#import sys

#input1 = sys.argv[1] --> file with annoations``#input2 = sys.argv[2] --> file with blast report``#output = sys.argv[3] --> modified blast report with annotations

f1 = open(sys.argv[1],"r")``f2 = open(sys.argv[2],"r")``f3 = open(sys.argv[3],"w")

#open and read line by line:
for line in f1:
 # break line by '_'
splitline = line.split("_")
# define search_id as the first element of the line
searchid = splitline[0]
# open blast report and read line by line
 for row in f2:
# split columns by tab separator
col = row.split("\t")
# define target_id as the content of the second column
targetid = col[1]
# when target_id matches search_id replace content with the whole line
 if searchid == targetid:
f3.write(targetid.replace(searchid, splitline))
 else:
 pass
f1.close()
f2.close()
f3.close()

【回答】

perl 对结构化计算支持有限,代码也比较难写。完成这类需求用 SPL 更方便:

A
1=file("annotations.txt").import().derive(left(_1,pos(_1,"_")-1):key)
2=file("blastreport.txt").import()
3>A2.switch(_2,A1:key)
4=A2.new(_1,_2._1,_3,_4,_5,_6,_7,_8,_9,_10,_11)

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值