All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
使用indexOf方法查找string位置的方法果断超时了
非常简短的方法,用到两个Set
public List<String> findRepeatedDnaSequences(String s) {
Set seen = new HashSet(), repeated = new HashSet();
for (int i = 0; i + 9 < s.length(); i++) {
String ten = s.substring(i, i + 10);
if (!seen.add(ten))
repeated.add(ten);
}
return new ArrayList(repeated);
}
还有巧妙地用到条件使用位操作
public List<String> findRepeatedDnaSequences(String s) {
Set<Integer> words = new HashSet<>();
Set<Integer> doubleWords = new HashSet<>();
List<String> rv = new ArrayList<>();
char[] map = new char[26];
//map['A' - 'A'] = 0;
map['C' - 'A'] = 1;
map['G' - 'A'] = 2;
map['T' - 'A'] = 3;
for(int i = 0; i < s.length() - 9; i++) {
int v = 0;
for(int j = i; j < i + 10; j++) {
v <<= 2;
v |= map[s.charAt(j) - 'A'];
}
if(!words.add(v) && doubleWords.add(v)) {
rv.add(s.substring(i, i + 10));
}
}
return rv;
}