All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
题目大意:
找到字符串中连续出现1次以上的子串,子串的长度为10。
解法:
使用一个map保存出现字符串以及对应的次数,如果出现2次及以上,加入到结果中。
java:
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
HashMap<String,Integer>t=new HashMap();
List<String>res=new ArrayList<String>();
int start=0,i=10;
for(;i<=s.length();start++,i++){
if(t.containsKey(s.substring(start,i))&&t.get(s.substring(start,i))==1){
int count=t.get(s.substring(start,i));
t.put(s.substring(start,i),++count);
res.add(s.substring(start,i));
}
else if(t.containsKey(s.substring(start,i))){
int count=t.get(s.substring(start,i));
t.put(s.substring(start,i),++count);
}
else t.put(s.substring(start,i),1);
}
return res;
}
}
想复杂了,可以使用两个set进行存储,一个存字符串的所有不重复子串,一个存出现两次及以上的子串。
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Set<String>seen=new HashSet<>(),repeat=new HashSet<>();
for(int i=0;i+9<s.length();i++){
if(!seen.add(s.substring(i,i+10)))
repeat.add(s.substring(i,i+10));
}
return new ArrayList<>(repeat);
}
}
最佳的方法:
使用位图法(Java中自带BitSet,BitMap类)。主要是提高空间效率
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
int len = s.length(), mod = 1 << 20, mask = 0xFFFFF, n = 0;
if (len < 11) return new ArrayList<>();
// 'A' -> 0 'C' -> 1 'G' -> 2 'T' -> 3
int[] ints = new int['T' + 1];
ints['C'] = 1; ints['G'] = 2; ints['T'] = 3;
List<String> list = new ArrayList<>();
BitSet seen = new BitSet(mod), added = new BitSet(mod);
for (int i = 0; i < len; i++) {
n = (n << 2) & mask | ints[s.charAt(i)]; // 获得连续10个字符串的哈希码 (基于上述的字符串->整数映射)
if (i >= 9)
if (seen.get(n)) { // 如果seen中在n位置上已有数字 则返回true 否则返回false
if (!added.get(n)) {
added.set(n);
list.add(s.substring(i - 9, i + 1));
}
} else seen.set(n);
}
return list;
}
}