Problem
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Solution
最直接的方法就是从头到尾,借助unordered_map, 对每一个可能性对进行查找,看有没有重复的
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
const int N = s.size();
unordered_map<string,int> mp;
for( int i = 0; i < N - 9; i++){
string str = s.substr(i, 10);
mp[str]++;
}
vector<string> rst;
for( auto it = mp.begin(); it != mp.end(); it++){
if(it->second > 1) rst.push_back(it->first);
}
return rst;
}
};
用A, C, G, T 都对应一个数字,然后把每一个字符串化作数字
这样 A 对应 0, C对应1, G对应2, T对应3
每两位表示一个字母,一个序列是10个字母,那就是20个数字是一个序列,掩码mask就是 1 << 20 -1;
每次向左移动两位。
借用位操作的一些技巧。
这里解释的很好 https://leetcode.com/discuss/74330/20-ms-solution-c-with-explanation
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
if(s.size() < 11) return vector<string>();
char lib[4];
lib['A'] = 0; lib['C'] = 1;lib['G'] = 2;lib['T'] = 3;
unordered_map<int,int> mp;
vector<string> rst;
int curNum = 0, mask = ( 1 << 20 ) - 1;
for( int i = 0; i < s.size() ; i++) {
curNum = ((curNum << 2 )&mask) | lib[s[i]];
if(i >= 9){
if(mp[curNum]++ == 1) {
rst.push_back(s.substr(i-9, 10));
}
}
}
return rst;
}
};
不知道为什么 local compiler 能通过, leetcode 就不能通过。。。。。。。。。。