题目
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”, Return: [“AAAAACCCCC”, “CCCCCAAAAA”]. |
---|
解析
这道题最直观的解法可以使用map来做,但是用时会比较长。其次可以考虑用位运算。A,T,C,G分别用0,1,2,3表示,即使用2bit可以表示一个字符,那么10个长度的字符串只需要20bit即可。
代码
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<char,int> m;
m['A']=0,m['T']=1,m['C']=2,m['G']=3;
unordered_map<int,int> count;
int mask=0x3ffff,num=0; //mask是用来取出低28位,方便右移
vector<string> res;
for(int i=0;i<9;i++){
num=num<<2;
num|=m[s[i]];
}
num=num<<2;
for(int i=9;i<s.size();i++){
num|=m[s[i]];
count[num]+=1;
if(count[num]==2){
res.push_back(s.substr(i-9,10)); //如果一个num出现第二次则重复次数超过1,加入到res中,采用等号是防止重复加入
}
num=(mask&num)<<2;
}
return res;
}
};