All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
找出给定字符串中长度为10的有重复的子字符串。如果单纯字符串的比较的话肯定是会超时的,要想办法对字符串进行编码。刚好字符串只有ACGT四种字符,所以定为0、1、2、3,每十个字符求一个值,判断该值是否出现过一次,就可以知道是不是重复的了。
代码:
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
vector<string> res;
if(s.empty() || s.size() < 11) return res;
char char_map[127];
char_map['A'] = 0;
char_map['C'] = 1;
char_map['G'] = 2;
char_map['T'] = 3;
map<int, int> nums;
for(int i = 0; i <= s.size()-10; ++i)
{
int num = 0;
for(int j = i; j < i+10; ++j)
{
num = num * 10 + char_map[s[j]];
}
if(nums[num]++ == 1)
{
res.push_back(s.substr(i, 10));
}
}
return res;
}
};