KMP算法是一种改进的字符串匹配算法。KMP算法的关键是根据给定的模式串sub,定义一个next函数。next函数包含了模式串本身局部匹配的信息。
1. 算法实现
(1) 计算next函数算法如下
void getNext(char sub[], int next[]) {
int i = -1, j= 0,
len = strlen(sub);
next[0] = -1;
while (j < len - 1) {
if (i == -1 || sub[j] == sub[i]) {
j++;i++;
next[j] = i;
} else
i = next[i];
} //while
}
next[j]的值表示sub[0...j-1]中最长后缀的长度等于相同字符序列的前缀,如:
01234567
abaabcac
| |
i j
对于next[]数组的定义如下:
1)next[j]=-1 j=0
2)next[j]=max {i|0<i<j, sub[0..i-1]=sub[j-i..j-1]}
3)next[j]=0 其他
进一步改进, 若sub[j]与sub[i]相等, 说明当sub[j]与主串中的某个字符比较但不相等时,
此时主串中的这个字符与sub[i]也是不相等的, 此时没有必要让sub[i]与主串中的这个字符比较,
即没有必要next[j]=i,直接: next[j] = next[i]
如下例所示:
由next[]的定义计算next[j],0<=j<=strlen(sub)
初始:next[j] = -1 j = 0时
next[j] = 0 j = 1时
j = 2:sub[0..j-1] = ab 则next[j] = 0
j = 3:sub[0..j-1] = aba 则next[j] = 1
j = 4:sub[0..j-1] = abab 则next[j] = 2
j = 5:sub[0..j-1] = ababc 则next[j] = 0
j = 6:sub[0..j-1] = ababca 则next[j] = 1
j = 7:sub[0..j-1] = ababcab 则next[j] = 2
j = 8:sub[0..j-1] = ababcaba 则next[j] = 3
(2) KMP算法如下:
int KMP(char str[], char sub[]) {
int i = 0, /** i ranges from 0 to strlen(string) - 1 */
j = 0, /** j ranges from 0 to strlens(trSub) - 1 */
lenstr = strlen(str),
lensub = strlen(sub);
int next[lensub];
getNext(sub, next); //调用getNext函数求next[]
while (i < lenstr && j < lensub) {
if (j == -1 || str[i] == sub[j]) { //继续比较后续字符
i++;
j++;
} else { //模式串向右移动
j = next[j];
}
} //while
if (j >= lensub)
return (i - lensub);
else
return -1;
}
(3) 测试代码如下:
#include <stdio.h>
#include <string.h>
#include "../sharedSource/KMP.c"
int main() {
char *str = "acabaabaabcacaabc", *sub = "abaabcac";
//0123456789 //01234567
int pos;
pos = KMP(str, sub);
printf("index: 0123456789\n");
printf(" ||||||||||\n");
printf("str : %s\n", str);
printf("sub : %s\n", sub);
printf("\nmatched position(index): %d\n", pos);
return 0;
}
输出: