All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
二进制表示+哈希表
表示十个字母需要30位二进制数,开2^30的哈希表记录
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
vector<string> str;
map<int,int> m;
int i,cur=0;
for(i=0;i<10;i++){
cur <<= 3;
cur |= (s[i] & 7);
}
m[cur] = 1;
for(i=10;i<s.size();i++){
cur <<= 3;
cur |= (s[i] & 7);
cur &= 0x3fffffff;
if(m.find(cur)!=m.end()){
if(m[cur] == 1) str.push_back(s.substr(i-9,10));
m[cur] ++;
}
else{
m[cur] = 1;
}
}
return str;
}
};
本文介绍了一种使用二进制表示和哈希表来查找DNA分子中长度为10且重复出现的序列的方法。通过将每个核苷酸转换为3位二进制数,并维护一个大小为2^30的哈希表,可以高效地识别出所有符合条件的重复序列。
398

被折叠的 条评论
为什么被折叠?



