KMP之next[]的求值-附传统0下标开始的next[]源码

最新推荐文章于 2023-10-30 09:02:59 发布

转载最新推荐文章于 2023-10-30 09:02:59 发布 · 491 阅读

文章标签：

#string #c #算法 #null

C/C++ 专栏收录该内容

1243 篇文章

订阅专栏

本文详细解析了KMP算法中next数组的计算过程及原理，并提供了具体的实例演示与源码实现，帮助读者深入理解next数组如何辅助模式串匹配。

KMP之next[]的求值-附传统0下标开始的next[]源码

- KMP算法是在已知模式串next[]值的基础上执行的
- next[]值仅取决于模式串本身

------------------------------------------------------------
              next[]的定义:
next[j]=
=0 当 j=1 时
=max{k|1<k<j且'P(1)...P(k-1)'='P(j-k+1)...P(j-1)'}当此集合非空时
=1 其它情况
-------------------------------------------------------------
下面以一个例子解释上面的定义
设有:
--------------------------------
   j : 1 2 3 4 5 6 7 8
string: a b a a b c a c
--------------------------------
求出:next[j]=?
首先,next[1]=0
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0
--------------------------------
求next[j+1]=? --(j=1)
1<k<j, j=1,
so, next[2]=1
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1
--------------------------------
求next[3]=next[j+1]? --(j=2)
因:next[2]=1,又,P(1)!=P(2),又next[1]=0
所以next[3]=1
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1 1
--------------------------------
求next[4]=?
因next[3]=1,P(3)=P(1)
所以next[4]=next[3]+1=2
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1 1 2
--------------------------------
next[5]=?
next[4]=2 => P(4)!=P(2) ,
next[2]=1 => P(4) =P(1) ,
next[5]=next[2]+1=2
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1 1 2 2
--------------------------------
next[6]=?
next[5]=2 => P(5)=P(2)
next[6]=next[5]+1=3
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1 1 2 2 3
--------------------------------
next[7]=?
next[6]=3 ==> P(6)!=P(3),
next[3]=1 ==> P(6)!=P(1),
又next[1]=0
so,next[7]=1
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1 1 2 2 3 1
--------------------------------
next[8]=?
next[7]=1 ==> P(7)=P(1)
so,next[8]=next[7]+1=2
--------------------------------
   j   : 1 2 3 4 5 6 7 8
string : a b a a b c a c
next[j]: 0 1 1 2 2 3 1 2
--------------------------------

##################################################
##################################################
可以看出,求next[]的过程,只要确定了next[1]=0后,
剩下的就可以依次递推出来了

由此可得算法,
但相对上述有所改动.
原因是:在C种传统上数组下标是从0开始的,
因此,得:
      5
      6 int *
      7 get_next( const char *const str )
      8 {
      9     int *next,len=strlen(str),i,j;
     10     assert(NULL!=(next=(int *)malloc(sizeof(int)*(len))));
     11     for(next[0]=-1,i=1,j=-1; i<len; ++i ){
     12         for(j=i-1;j!=-1&&str[j]!=str[next[j]];j=next[j])
     13             ;
     14         next[i]=(j!=-1&&str[j]==str[next[j]])?next[j]+1:0;
     15     }
     16     return next;
     17 }
     18 int
     19 main(int argc, char **argv)
     20 {
     21     const char a[]="abcabaa";
     22     const char b[]="abcabaabcaabaabcabaabcabaabaabcaabaabaabaabcacabaa";
     23     int * next1=get_next(a);
     24     int i,j;
     25     for(i=0;i<strlen(a);++i){
     26         printf("%d/t",next1[i]);
     27     }
     28     putchar('/n');
     29     for(i=0,j=0;i<strlen(b);){
     30         if(j==strlen(a)){
     31             printf("%d/n",i-strlen(a));
     32             j=0;
     33         }
     34         j=(-1==j)?(++i,0):a[j]==b[i]?(++i,j+1):next1[j];
     35     }
     36     exit(0);
     37 }

上述程序的运行结果为:
-1 0 0 0 1 2 0
0
13

第一行,为所得next[],细心的朋友也看出来了,我还没有free(next),不过在exit后,系统会做的,:))
第2行和第3行,表示在和模式串匹配的源串中的开头位置

PS:这只是一个简单的手记,目的是为了能让自己记住并能简单应用,过几天有时间再写个完整的KMP给大家.
如果我理解的有不对的地方,希望大家指出,这DD我从早上看起,头确实有点晕,呆会正好睡的香 *^_^*

再此基础上的nextval[]

     18 int *
     19 get_nextval( const char *const str,int *const next )
     20 {
     21     int i,j,len=strlen(str);
     22     for(i=1; i<len; ++i){
     23     /* j=next[i];*/
     24         if((j=next[i])!=0&&str[i]==str[j])
     25             next[i]=next[j];
     26     }
     27     return next;
     28 }

PS:我的模式匹配用了这个,没出问题,代码应该是对的,
由于平常用next[]很少,改进后的nextval[]更好,这下面的
就是我写在程序里的一段:

      7 /******************* static functions *****************/
      8 static int     *
      9 _get_nextval(const char * const t_str)
     10 {
     11     int *next, len=strlen(t_str),i,j;
     12     assert(NULL != (next=(int *)malloc(sizeof(int)*(len))));
     13
     14     for(next[0]=-1,i=1,j=-1; i<len; ++i){
     15         for(j=i-1; j!=-1&&t_str[j]!=t_str[next[j]]; j=next[j]);
     16         next[i]=(j!=-1&&t_str[j]==t_str[next[j]])?next[j]+1:0;
     17     }
     18     for(i=1;i<len;++i){
     19         if((j=next[i])!=0&&t_str[i]==t_str[j])
     20             next[i]=next[j];
     21     }
     22     return next;
     23 }