NCBI EST 文库格式转换

最新推荐文章于 2024-07-05 16:47:18 发布

msw521sg

最新推荐文章于 2024-07-05 16:47:18 发布

阅读量482

点赞数

CC 4.0 BY-SA版权

分类专栏：生物信息 python 文章标签：基因组-生物信息学 python

本文链接：https://blog.youkuaiyun.com/msw521sg/article/details/52562221

生物信息同时被 2 个专栏收录

54 篇文章

订阅专栏

python

27 篇文章

订阅专栏

本文介绍了一个 Python 脚本，用于处理 NCBI 的 EST 数据库文件，并将其转换为特定格式。脚本读取原始文件，提取必要的元数据，并按照 LIBEST 前缀重新组织数据条目。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

NCBI EST 文库格式转换

#!/usr/bin/env python
# -*- coding: utf-8 -*-
with open('1.txt', 'r') as f:
    a = []
    b = []
    for num, line in enumerate(f):
        if 'Lib' not in line:
            line1 = line.strip().split()[-2]
            b.append(int(line1))
            for i in range(1, int(line1)+1):
                a.append(num+i)
    f.seek(0, 0)
    print sum(b)
    tissue = ''
    for num, line in enumerate(f):
        if 'Lib' not in line:
            tissue = line.strip().split()[0]
        if num in a:
            line2 = line.strip('\t').split()
            n = int(line2[0][4:])
            if n < 1000:
                print 'LIBEST_000%s\t%s\t%s\t%s' % (n, tissue, line2[-1], ' '.join(line2[1:-1]))
            elif n < 10000:
                print 'LIBEST_00%s\t%s\t%s\t%s' % (n, tissue, line2[-1], ' '.join(line2[1:-1]))
            else:
                print 'LIBEST_0%s\t%s\t%s\t%s' % (n, tissue, line2[-1], ' '.join(line2[1:-1]))