KMP算法

最新推荐文章于 2024-11-16 09:45:51 发布

原创最新推荐文章于 2024-11-16 09:45:51 发布 · 144 阅读

0 ·

CC 4.0 BY-SA版权

算法专栏收录该内容

19 篇文章

订阅专栏

主要思路：

比较两个字符串，主字符串S，用i来标记比较的位置，模式串T，用j来标志比较的位置。当遇到不匹配的字符时，i不回溯，只改变j的位置。

显然j应该回退到不需要与主串S重复比较的位置才是高效的，即S[i-j]~S[i]=T[0]~T[j]。要知道j具体要回退到什么地方需要利用next表。

next表中存储的是主字符串S[0~k]区间的前缀集合与后缀集合交集中最长元素的长度。

比如"ababa"前缀集合{"a","ab","aba","abab"}，后缀集合{"a","ba","aba","baba"}，它们的交集是{"a","aba"}，其中最长元素为"aba"，长度为3。这样做的依据是当S[i]与T[j]发生失配时，可以确定的是S[i-j]~S[i-1]=T[0]~T[j-1]，所以j应该回退到能使T[0~j-1]与当前S[0~i-1]的后缀集合的交集中最大元素长度m表示的位置。这个位置的下标就储存在next[k]中，k代表字符串S失配的位置。

那么如何求next数组呢？

规定next[0]=-1，用主串S和它自身进行匹配，规定i=0，j=-1，即i在j的后一位。当S[i]=S[j]时，j的位置代表当前前后缀公共长度的最大值，next[i]=j。当S[i]!=S[j]时，j需要回退到S[0~j-1]和当前S[0~i-1]的后缀集合的交集中最大元素长度m表示的位置，所以j=next[j]（每次循环都在更新next[i]，j在i之前，next[j]一定是已经求出来的了）。

求next表

void getNext()
{
	int i=0;
	int j=-1;
	next[0]=-1;
	while(i<tlen)
	{
		if(j==-1||t[i]==t[j])
		{
			++i;
			++j;
			next[i]=j;
		}
		else
		{
		    j=next[j];
		}
	}
}

KMP算法

int KMP()
{
	int i=0;
	int j=0;
	int count=0;

	if(slen==1&&tlen==1)
	{
		if(s[0]==t[0])
		return 1;
		else return 0;
	}
	getNext();//求next表

	while(i<slen&&j<tlen)
	{
		if(j==-1||s[i]==t[j])
		{
			++i;
			++j;
		}
		else j=next[j];
	}
    if(j==tlen)
    {
        return i-j;//能找到匹配串，返回一个不为-1的值
    }
	return -1;//未找到匹配穿，返回-1
}

变形

剪花布条 HDU - 2087

一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？

Input

输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。

Output

输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。

Sample Input

abcde a3
aaaaaa  aa
#

Sample Output

0
3

这一题的不同之处在于，不仅要匹配模式串T与主串S，还要找出主串S中包含多少个不重叠的模式串T。因此，在找到一个匹配的模式串T后，不能退出循环，而是接着往下搜寻，此时若找到一个匹配的模式串T，就更新计数变量count的值，并将模式串的游标j重置为0，当前主串位置i不变，继续进行比较，这个操作相当于把S[0~i-1]部分删去，用剩下的部分作为新的主串来与模式串T进行匹配。

代码改动部分

int KMP()
{
	//重复部分略去

	while(i<slen)
	{
		if(j==-1||s[i]==t[j])
		{
			++i;
			++j;
		}
		else j=next[j];
        if(j==tlen) //找到一个匹配的模式串后，将j置为0，并更新计数变量count
        {
            count++;
            j=0;
        }
	}
	return count;
}

Number Sequence HDU - 1711

Given two sequences of numbers : a[1], a[2], ...... , a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input

The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], ...... , a[N]. The third line contains M integers which indicate b[1], b[2], ...... , b[M]. All integers are in the range of [-1000000, 1000000].

Output

For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input

2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1

Sample Output

6
-1

这题要求的是当找到一个能与主串匹配的模式串时，匹配成功的位置，如果不止一个结果，则输出最小的结果，如果匹配不成功，输出-1。从中可以知道如果匹配成功就可以不再继续搜索，因为只要求得最小的结果即可，也就是第一次匹配成功的位置。要求出这个位置，其实只要得到模式串的长度lent，则i-lent+1就是第一次匹配成功的位置。当匹配成功时，有lent=j，即pos=i-j+1。

代码改动

int kmp()
{
    //略去
    while(i<slen)
    {
        if(j==-1||s[i]==t[j])
        {
            i++;
            j++;
        }
        else
            j=next[j];

        if(j==tlen)//找到一个匹配的模式串就可以返回主串匹配成功的位置
            return i-j+1;
    }
    return -1;
}

Oulipo HDU - 1686

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

这道题要求主串中包含多少个模式串（可重叠），所以在找到一个匹配的模式串后，只需更新计数变量，然后按照失配的情况来处理，继续往前搜寻。

代码改动

int kmp()
{
    //略去
    while(i<slen)
    {
        if(j==-1||s[i]==t[j])
        {
            i++;
            j++;
        }
        else
            j=next[j];

        if(j==tlen)
        {
            count++;
            j=next[j];//按照s[i]!=t[j]的情况来处理
        }
    }
    return count;
}

Cyclic Nacklace HDU - 3746

CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of "HDU CakeMan", he wants to sell some little things to make money. Of course, this is not an easy task.

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl's fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls' lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet's cycle is 9 and its cyclic count is 2:

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.

Input

The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by 'a' ~'z' characters. The length of the string Len: ( 3 <= Len <= 100000 ).

Output

For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.

Sample Input

3
aaa
abca
abcde

Sample Output

0
2
5

这道题要求重复子串中间的字符串长度，也就是求主串S去掉最长的公共前后缀所剩下的部分。next[j]中存放长度为j的主串的最大公共前后缀的长度。但最长公共前后缀在字符串中可能重叠，得到的公共部分可能长度是个负数，所以要特殊处理。（其实没大搞懂为什么这么处理）。

代码改动

void main()
{
        //略去
        getNext();
        b=l-next[l];    //b为去掉前缀后的长度
        if(b!=l&&l%b==0)    //若b==l说明没有公共前后缀
        {
            b=0;
        }
        else
        {
            b=b-next[l]%b;
        }
        printf("%d\n",b);
}