All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
自己写了个hash和dehash把10位的字符串转换成int。
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<int,int> CountMap;
vector<string> ret;
if(s.size()<10) return ret;
for(int i=0; i<=s.size()-10; ++i){
int temp = SToInt(s,i);
CountMap[temp]++;
}
for(auto i=CountMap.begin(); i!=CountMap.end(); ++i){
if(i->second>1)
ret.push_back(dehash(i->first));
}
return ret;
}
int SToInt(string &s,int start){
int num = 0;
for(int i=start; i<=start+9; ++i){
int temp = 0;
if(s[i]=='A') temp = 1;
if(s[i]=='C') temp = 2;
if(s[i]=='G') temp = 3;
if(s[i]=='T') temp = 4;
num = 5*num + temp;
}
return num;
}
string dehash(int num){
string ret;
while(num>0){
int last = num%5;
num = num/5;
if(last==1) ret = 'A'+ret;
if(last==2) ret = 'C'+ret;
if(last==3) ret = 'G'+ret;
if(last==4) ret = 'T'+ret;
}
return ret;
}
};
本文介绍了一个函数,用于在DNA分子中查找所有长度为10的重复子串。通过使用哈希表来计数每个子串的出现次数,该函数能够高效地识别重复的DNA序列,并返回所有出现超过一次的子串。
394

被折叠的 条评论
为什么被折叠?



