KMP算法详解-优快云博客

本文链接：https://blog.youkuaiyun.com/Joe19310/article/details/76687134

Problem Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

#include<cstdio>
#include<iostream>
#include<cstdlib>
#include<cstring>
#include<cmath>
using namespace std;

const int MAX=1000010;
char p[MAX],text[MAX];
int pre[MAX],sum;
int n,m;
void compute_preflx()
{
	int k=0;
	pre[0]=0;
	for(int i=1;i<m;i++){
		while(k>0&&p[k]!=p[i])
		  k=pre[k-1];
		if(p[k]==p[i])
		  k++;
		  pre[i]=k;
	}
 } 
 int KMP()
 {
 	sum=0;
 	compute_preflx();
 	int q=0;
 	for(int i=0;i<n;i++){
 		while(q>0&&p[q]!=text[i])
 			q=pre[q-1];
		 if(p[q]==text[i])   q++;
		 if(q==m){
		 	sum++;//匹配计数
			 q=pre[q-1];//寻找下一个匹配； 
		 }
	 }
	 return sum;
 }
 int main(){
 	int T;
 	cin>>T;
 	while(T--)
 	{
 		scanf("%s %s",p,text);
		 n = strlen(text);
		 m=strlen(p);
		 printf("%d\n",KMP()); 
	 }
	 return 0;
 }

KMP算法的模板：

#include<iostream>
#include<cstring>
#include<cstdio>
#include<cstdlib>

using namespace std;

const int N = 100;    // 文本串的最大长度
const int M = 20;     //模式串的最大长度
char t[N];
char p[M];
int pre[M+1];     //前缀函数

//数组下标从0开始的代码
//计算模式P的前缀函数
void compute_preflx(char * p,int m){
    int k = 0;
    pre[0] = 0; //从下标0开始
    for(int i = 1; i < m; ++i){
        while(k > 0 and p[k] != p[i])
            k = pre[k-1];
        if(p[k] == p[i])
            k++;
        pre[i] = k;
    }
}

void kmp(char * text,char * p, int n,int m){
    compute_preflx(p,m);   //计算出模式p的前缀函数
    int q = 0;      //模式p的下标
    for(int i = 0; i<n; ++i){
        //对文本字符从左向右扫描，i从0开始，指针i是不回缩的
        while(q >0 and p[q] != text[i])
            //当模式p中的字符与文本字符不匹配时，
            q = pre[q-1];     //模式p的下标去要回缩,因为前一个位置是匹配的，即text[i-1]=p[q-1]
        if(p[q] == text[i])  //当模式p中的字符与文本字符匹配时
            q++;
        if(q == m){ //即模式p的所有字符都与文本字符匹配
            cout<<"s="<<i-m+1<<endl;    //打印匹配位置
            q = pre[q-1];         //寻找下一个匹配
        }
    }
}

int main()
{
    scanf("%s%s",t,p);

    kmp(t,p,strlen(t),strlen(p));

    for(int x=0;x<strlen(p);++x)
        cout<<"pre["<<x<<"]=:"<<pre[x]<<" ,"<<endl;
    return 0;

}