Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3 BAPC BAPC AZA AZAZAZA VERDI AVERDXIVYERDIAN
Sample Output
1 3 0
#include<cstdio>
#include<iostream>
#include<cstdlib>
#include<cstring>
#include<cmath>
using namespace std;
const int MAX=1000010;
char p[MAX],text[MAX];
int pre[MAX],sum;
int n,m;
void compute_preflx()
{
int k=0;
pre[0]=0;
for(int i=1;i<m;i++){
while(k>0&&p[k]!=p[i])
k=pre[k-1];
if(p[k]==p[i])
k++;
pre[i]=k;
}
}
int KMP()
{
sum=0;
compute_preflx();
int q=0;
for(int i=0;i<n;i++){
while(q>0&&p[q]!=text[i])
q=pre[q-1];
if(p[q]==text[i]) q++;
if(q==m){
sum++;//匹配计数
q=pre[q-1];//寻找下一个匹配;
}
}
return sum;
}
int main(){
int T;
cin>>T;
while(T--)
{
scanf("%s %s",p,text);
n = strlen(text);
m=strlen(p);
printf("%d\n",KMP());
}
return 0;
}
KMP算法的模板 :
#include<iostream>
#include<cstring>
#include<cstdio>
#include<cstdlib>
using namespace std;
const int N = 100; // 文本串的最大长度
const int M = 20; //模式串的最大长度
char t[N];
char p[M];
int pre[M+1]; //前缀函数
//数组下标从0开始的代码
//计算模式P的前缀函数
void compute_preflx(char * p,int m){
int k = 0;
pre[0] = 0; //从下标0开始
for(int i = 1; i < m; ++i){
while(k > 0 and p[k] != p[i])
k = pre[k-1];
if(p[k] == p[i])
k++;
pre[i] = k;
}
}
void kmp(char * text,char * p, int n,int m){
compute_preflx(p,m); //计算出模式p的前缀函数
int q = 0; //模式p的下标
for(int i = 0; i<n; ++i){
//对文本字符从左向右扫描,i从0开始,指针i是不回缩的
while(q >0 and p[q] != text[i])
//当模式p中的字符与文本字符不匹配时,
q = pre[q-1]; //模式p的下标去要回缩,因为前一个位置是匹配的,即text[i-1]=p[q-1]
if(p[q] == text[i]) //当模式p中的字符与文本字符匹配时
q++;
if(q == m){ //即模式p的所有字符都与文本字符匹配
cout<<"s="<<i-m+1<<endl; //打印匹配位置
q = pre[q-1]; //寻找下一个匹配
}
}
}
int main()
{
scanf("%s%s",t,p);
kmp(t,p,strlen(t),strlen(p));
for(int x=0;x<strlen(p);++x)
cout<<"pre["<<x<<"]=:"<<pre[x]<<" ,"<<endl;
return 0;
}