题目链接
题目描述
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example
Input: s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”
Output: [“AAAAACCCCC”, “CCCCCAAAAA”]
代码一
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
if(s == null || s.length() == 0) {
return new ArrayList<String>();
}
Map<String, Integer> map = new HashMap<String, Integer>();
for(int i = 0; i < s.length(); i++) {
if(i + 10 > s.length()) {
break;
}
String temp = s.substring(i, i + 10);
if(map.containsKey(temp)) {
map.put(temp, map.get(temp) + 1);
}else {
map.put(temp, 1);
}
}
List<String> ans = new ArrayList<String>();
for(Map.Entry<String, Integer> entry : map.entrySet()) {
if(entry.getValue() > 1) {
ans.add(new String(entry.getKey()));
}
}
return ans;
}
}
耗时75ms
代码二
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
if(null == s || s.length() < 10) {
return new ArrayList<String>();
}
List<String> ans = new ArrayList<String>();
char[] map = new char[256];
map['A'] = 0;//00
map['T'] = 1;//01
map['C'] = 2;//10
map['G'] = 3;//11
int hash = 0;
int mask = 0xFFFFF;
for(int i = 0; i < 10; i++) {
hash = ((hash << 2 ) | map[s.charAt(i)] ) & mask;
}
BitSet seen = new BitSet(1 << 20); //因为10个字符的长度,每个字符我们用2位表示,因此一共需要20位
BitSet more = new BitSet(1 << 20);
seen.set(hash);
int length = s.length();
for(int i = 10; i < length; i++) {
hash = ((hash << 2 ) | map[s.charAt(i)] ) & mask;
if(seen.get(hash)) {
if(!more.get(hash)) {
more.set(hash);
ans.add(s.substring(i - 9, i + 1));
}
} else {
seen.set(hash);
}
}
return ans;
}
}
耗时9ms
代码二使用了滑动窗口加上位运算的方法,该方法的效率较高。