Recently, preparing for interview, I have review the algorithm of KMP. The KMP is a algorithm that designed to find specified pattern in a given string. Comparing to naive matching algorithm, the time complex of KMP is decreased to n.
Let's get down to details of this algorithm.
Assuming we search pattern: P = (abc) in a string: S = (abababc).In the traditional matching algorithm, storing a variant named i, we compare the elements of P, starting from P[1] to P[3], and S, starting from S[i] to S[i+3], one by one until every elements match completely. But when it comes to a failure, the variant i is added by 1
and repeat the process above until variant:i comes to the end of S or the process above return.
So, how could we improve the algorithm above? If you check the whole procedure above deliberately, you will find that the addition of i is too prudent. Actually, based on the fact that we already know that ,assuming the comparison failed at position of S: j,the string: S[i:j] equals P[0:j-i], the addition quantity of i can be caculated previously.
Therefore, the difference of KMP and NMA is the shift of variant: i. KMP set up a function named PREFIX_COMPUTE to caculate the length of a string which is maximum prefix and suffix of P. KMP add i by a quantity returned by PREFIX_COMPUTE function when it comes to a comparison failure. The most difficult part of KMP is the PREFIX_COMPUTE function.
Suppose we compute the PREFIX_COMPUTE result at position: j, storing a variant named:k, first step is to check if S[j]=s[PREFIX_COMPUTE(k+1)] stands. Iterate k = PREFIX_COMPUTE(k) until the equation stands. Then, update k to k+1 and the result of PREFIX_COMPUTE(j) is k.
The JAVA code is as following:
public class KMP {
/**
* @param args
*/
private int[] PREFIX_COMPUTE(String P)
{
int len = P.length();
int[] res = new int[len];
int k=-1;
res[0] = k;
for(int j=1;j<len;++j)
{
//compute the PREFIX_FUNCTION at positon j
//iterate
while(k>0 && P.charAt(j) != P.charAt(k+1))
{
k = res[k];
}
//check
if (P.charAt(k+1) == P.charAt(j))
{
k++;
}
res[j] = k;
}
return res;
}
private int kmp_search(String P,String S)
{
int[] prefix = PREFIX_COMPUTE(P);
/*for(int j=0;j<P.length();++j)
{
System.out.print(prefix[j]);
}*/
for(int i=0;i<S.length();)
{
int j = 0;
for(j=0;j<P.length();++j)
{
if(S.charAt(i) != P.charAt(j))
{
i = i + prefix[j]+ 2;
break;
}
i++;
}
if (j == P.length())
return i-P.length();
}
return -1;
}
public static void main(String[] args) {
// TODO Auto-generated method stub
KMP kmp = new KMP();
String S = "123456ab123aba123";
String P = "aba";
int res = kmp.kmp_search(P,S);
System.out.print(res);
}
}