【Rosalind】Open Reading Frames - 整理用过的函数

Rosalind习题:Open Reading Frames (题目ID:ORF)

这次的习题主要使用了几个之前题目用过的函数,比如反向互补序列、转录、翻译这些的。都是比较简单的。之前的一些简单的习题我都没有单独写博客,这次正好一道题用到好几个前面写过的程序,正好都整理成函数放到这个代码里。

Problem

Either strand of a DNA double helix can serve as the coding strand for RNA transcription. Hence, a given DNA string implies six total reading frames, or ways in which the same region of DNA can be translated into amino acids: three reading frames result from reading the string itself, whereas three more result from reading its reverse complement.

An open reading frame (ORF) is one which starts from the start codon and ends by stop codon, without any other stop codons in between. Thus, a candidate protein string is derived by translating an open reading frame into amino acids until a stop codon is reached.

Given: A DNA string s of length at most 1 kbp in FASTA format.

Return: Every distinct candidate protein string that can be translated from ORFs of s. Strings can be returned in any order.

Sample Dataset

>Rosalind_99
AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG

Sample Output

MLLGSFRLIPKETLIQVAGSSPCNLS
M
MGMTPRLGLESLLE
MTPRLGLESLLE

我的代码

这次我的代码都比较简单容易懂,没啥好解释的,一看就看懂了,都是很简单的功能~

'''
Rosalind Problems: [ORF] Open Reading Frames
'''


def rev_comp(fa):
    '''
    Give transript and reversed complementary transcript
    '''
    fa_transcript = ''
    for i in range(len(fa)):
        index = -(i+1)
        if fa[index] == 'A':
            fa_transcript = fa_transcript + 'U'
        elif fa[index] == 'T':
            fa_transcript = fa_transcript + 'A'
        elif fa[index] == 'C':
            fa_transcript = fa_transcript + 'G'
        else:
            fa_transcript = fa_transcript +'C'
    fa_rc = ''
    for i in range(len(fa_transcript)):
        index = -(i+1)
        if fa_transcript[index] == 'A':
            fa_rc = fa_rc + 'U'
        elif fa_transcript[index] == 'U':
            fa_rc = fa_rc + 'A'
        elif fa_transcript[index] == 'C':
            fa_rc = fa_rc + 'G'
        else:
            fa_rc = fa_rc +'C'
    return fa_transcript, fa_rc

def find_all(s,substring):
    '''
    this function is for find all substrings in one string.
    It returns the index(es) of the start of all substring(s).
    '''
    index_list = []
    index = s.find(substring)
    while index != -1: #find() returns -1 if there is no match.
        index_list.append(index)
        index = s.find(substring, index+1)
    #mimic the return rule of find()
    if len(index_list) > 0:
        return index_list
    else:
        return -1

def orf(mrna):
    #finding = find_all(mrna, 'AUG')
    #print(finding)
    start_codon = 'AUG'
    stop_codon = ['UAA', 'UAG', 'UGA']
    i, j = 0,0
    out = []
    while i <= len(mrna)-2:
        if mrna[i:i+3] == start_codon:
            j=i
            sequence=''
            while i<= len(mrna) -2:
                if mrna[i:i+3] in stop_codon:
                    out.append(sequence)
                    break
                sequence = sequence + mrna[i:i+3]
                i = i+3
        i = j+1
        j = j+1
    #print(out)
    return out

def translate(rnaseq):
    codon_table = { 'UUU': 'F', 'CUU': 'L', 'AUU': 'I', 'GUU': 'V', \
                    'UUC': 'F', 'CUC': 'L', 'AUC': 'I', 'GUC': 'V', \
                    'UUA': 'L', 'CUA': 'L', 'AUA': 'I', 'GUA': 'V', \
                    'UUG': 'L', 'CUG': 'L', 'AUG': 'M', 'GUG': 'V', \
                    'UCU': 'S', 'CCU': 'P', 'ACU': 'T', 'GCU': 'A', \
                    'UCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A', \
                    'UCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A', \
                    'UCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A', \
                    'UAU': 'Y', 'CAU': 'H', 'AAU': 'N', 'GAU': 'D', \
                    'UAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D', \
                    'UAA': 'Stop', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E', \
                    'UAG': 'Stop', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E', \
                    'UGU': 'C', 'CGU': 'R', 'AGU': 'S', 'GGU': 'G', \
                    'UGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G', \
                    'UGA': 'Stop', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G', \
                    'UGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'}
    length = len(rnaseq)
    proseq = []
    for i in range(0,length,3):
        triplet = rnaseq[i:i+3]
        if codon_table[str(triplet)] != 'Stop':
            proseq.append(codon_table[str(triplet)])
        else:
            break
    proseq = ''.join(proseq)
    return proseq


dna = 'TATACATCACTCCAGGCATCAGAAAATCATGAGAAAGTCTGTGCGCGTAGCGAGAAGGTAGGCTCATTTGTTACCCTTGGACAACTACTGCCGCGTCTGGGCCTCCAAATCGGCTGGTCTTTTTCAGCTCCGTCTTAGGTATCGCGAAATGGACGGGAGGACCATAACTTACCTCCTCTTCTTTTGGCAGTCAGGCTATGACCACGTTTTGTCGGTTACAGATCACCTACCGCGGCGTAACACTGGTGCATATAGCTTGGTTGGGTTGCCTCTCCGCCTTCTCTGACTGGCGAGTGTACGGTAGGAACGCCGGTTCAATTGCATGCTCTGACCTTCTCAGGTAGAATTTCCAGACGAGTTGACAGACTCATCGTTACGCGGGCGGCGGTTCCAAAGCTCCTTACTAGAGATAGACAAGCGCCTAAATGGTTGCTTCCCGAGACGTTCATTAGCTAATGAACGTCTCGGGAAGCAACCATCATATCGATCCCGTGAATCCCTGCCCGTATGCCCCACAGGATAAGGATACACCAGTGACTGAACCTCTGCAATAGTCAGAGATCAGGGTGCTCTTTCATAGCTAATAGCTAGGCCGCGTACTTTAAGTTGTAACACTAACTGCTATGTGGTGAGCTTGAACGCGCGAAGCTGCCCCACAAGATGAAATATGGCCTTCGGAAAGATCACATTCTTGACCTCTGGGGTGTCACTTAAAATTGGCGAAGGTCGGAAAACTCTTTCTATTGCCCGCAAGGCTAAATGGTTCCAACCCCGATGTGTATTTCTCAAACTTTTCAGGTTTTTCTGAGTTACGAACAAGGGCTCGAGCGTGGGAATAGTTTAAATGAACTGTAGATTGAAGTATCGCAAGGAGGAAGTATTCTCTATCAGACGCTTGGTCACG'
mrna1, mrna2 = rev_comp(dna)
#print(mrna1, mrna2)
orf_list1=orf(mrna1)
orf_list2=orf(mrna2)
orf_list = orf_list1
for i in orf_list2:
    if i not in orf_list:
        orf_list.append(i)
#print(orf_list)
for i in orf_list:
    print(translate(i))
输入的序列

TATACATCACTCCAGGCATCAGAAAATCATGAGAAAGTCTGTGCGCGTAGCGAGAAGGTAGGCTCATTTGTTACCCTTGGACAACTACTGCCGCGTCTGGGCCTCCAAATCGGCTGGTCTTTTTCAGCTCCGTCTTAGGTATCGCGAAATGGACGGGAGGACCATAACTTACCTCCTCTTCTTTTGGCAGTCAGGCTATGACCACGTTTTGTCGGTTACAGATCACCTACCGCGGCGTAACACTGGTGCATATAGCTTGGTTGGGTTGCCTCTCCGCCTTCTCTGACTGGCGAGTGTACGGTAGGAACGCCGGTTCAATTGCATGCTCTGACCTTCTCAGGTAGAATTTCCAGACGAGTTGACAGACTCATCGTTACGCGGGCGGCGGTTCCAAAGCTCCTTACTAGAGATAGACAAGCGCCTAAATGGTTGCTTCCCGAGACGTTCATTAGCTAATGAACGTCTCGGGAAGCAACCATCATATCGATCCCGTGAATCCCTGCCCGTATGCCCCACAGGATAAGGATACACCAGTGACTGAACCTCTGCAATAGTCAGAGATCAGGGTGCTCTTTCATAGCTAATAGCTAGGCCGCGTACTTTAAGTTGTAACACTAACTGCTATGTGGTGAGCTTGAACGCGCGAAGCTGCCCCACAAGATGAAATATGGCCTTCGGAAAGATCACATTCTTGACCTCTGGGGTGTCACTTAAAATTGGCGAAGGTCGGAAAACTCTTTCTATTGCCCGCAAGGCTAAATGGTTCCAACCCCGATGTGTATTTCTCAAACTTTTCAGGTTTTTCTGAGTTACGAACAAGGGCTCGAGCGTGGGAATAGTTTAAATGAACTGTAGATTGAAGTATCGCAAGGAGGAAGTATTCTCTATCAGACGCTTGGTCACG

我的输出
M
MKEHPDL
MMVASRDVH
MVASRDVH
MNVSGSNHLGACLSLVRSFGTAARVTMSLSTRLEILPEKVRACN
MSLSTRLEILPEKVRACN
MQLNRRSYRTLASQRRRRGNPTKLYAPVLRRGR
MHQCYAAVGDL
MVLPSISRYLRRS
MIF
MRKSVRVARR
MDGRTITYLLFFWQSGYDHVLSVTDHLPRRNTGAYSLVGLPLRLL
MTTFCRLQITYRGVTLVHIAWLGCLSAFSDWRVYGRNAGSIACSDLLR
ML
MNVSGSNHHIDPVNPCPYAPQDKDTPVTEPLQ
MPHRIRIHQ
MW
MKYGLRKDHILDLWGVT
MAFGKITFLTSGVSLKIGEGRKTLSIARKAKWFQPRCVFLKLFRFF
MVPTPMCISQTFQVFLSYEQGLERGNSLNEL
MCISQTFQVFLSYEQGLERGNSLNEL

本文于2020年10月5日发布在本人博客https://emmettoncloud.cn/archives/48上,现搬至优快云。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

EmmettPeng

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值