All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
public class Solution {
public int hashcode(String s){
int res = 0;
for(int i=0; i<s.length(); i++){
res = res << 2 | hash(s.charAt(i));
}
return res;
}
public List<String> findRepeatedDnaSequences(String s) {
List<String> list = new LinkedList<String>();
if(s == null || s.length() <= 10){
return list;
}
HashSet<Integer> set = new HashSet<Integer>();
for(int i=0; i<=s.length() - 10; i++){
String tmp = s.substring(i, i + 10);
int hash = hashcode(tmp);
if(set.contains(hash) && !list.contains(tmp)){ //记得判断该字符串不在当前的list里面
list.add(tmp);
}else{
set.add(hash);
}
}
return list;
}
public static int hash(char c){
if(c == 'A'){
return 0;
}else if(c == 'C'){
return 1;
}else if(c == 'G'){
return 2;
}else if(c == 'T'){
return 3;
}
return 0;
}
}
将字符转换为整数型以便节约空间。