[LeetCode]Repeated DNA Sequences

最新推荐文章于 2022-07-28 15:01:27 发布

原创最新推荐文章于 2022-07-28 15:01:27 发布 · 458 阅读

0 ·

CC 4.0 BY-SA版权

LeetCode 专栏收录该内容

149 篇文章

订阅专栏

本文介绍了一个函数，用于在DNA分子中查找所有长度为10的重复子串。通过使用哈希表来计数每个子串的出现次数，该函数能够高效地识别重复的DNA序列，并返回所有出现超过一次的子串。

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

自己写了个hash和dehash把10位的字符串转换成int。

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        unordered_map<int,int> CountMap;
        vector<string> ret;
        if(s.size()<10) return ret;
        for(int i=0; i<=s.size()-10; ++i){
            int temp = SToInt(s,i);
            CountMap[temp]++;
        }
        for(auto i=CountMap.begin(); i!=CountMap.end(); ++i){
            if(i->second>1)
                ret.push_back(dehash(i->first));
        }
        return ret;
    }
    int SToInt(string &s,int start){
        int num = 0;
        for(int i=start; i<=start+9; ++i){
            int temp = 0;
            if(s[i]=='A') temp = 1;
            if(s[i]=='C') temp = 2;
            if(s[i]=='G') temp = 3;
            if(s[i]=='T') temp = 4;
            num = 5*num + temp;
        }
        return num;
    }
    string dehash(int num){
        string ret;
        while(num>0){
            int last = num%5;
            num = num/5;
            if(last==1) ret = 'A'+ret;
            if(last==2) ret = 'C'+ret;
            if(last==3) ret = 'G'+ret;
            if(last==4) ret = 'T'+ret;
        }
        return ret;
    }
};