Long Long Message（后缀数组求最长公共子串）

最新推荐文章于 2020-12-21 05:43:03 发布

永夜莫明

最新推荐文章于 2020-12-21 05:43:03 发布

阅读量343

点赞数 1

CC 4.0 BY-SA版权

分类专栏：后缀数组

本文链接：https://blog.youkuaiyun.com/qq_42936517/article/details/89645410

后缀数组专栏收录该内容

10 篇文章

订阅专栏

本文介绍了一种算法，用于找出两个长字符串中的最长公共连续子串，以帮助小猫计算发送给母亲短信的最可能原始文本长度，从而优化短信费用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

题目链接：https://vjudge.net/problem/POJ-2774

The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days: his mother is getting ill. Being worried about spending so much on railway tickets (Byterland is such a big country, and he has to spend 16 shours on train to his hometown), he decided only to send SMS with his mother.

The little cat lives in an unrich family, so he frequently comes to the mobile service center, to check how much money he has spent on SMS. Yesterday, the computer of service center was broken, and printed two very long messages. The brilliant little cat soon found out:

1. All characters in messages are lowercase Latin letters, without punctuations and spaces.
2. All SMS has been appended to each other – (i+1)-th SMS comes directly after the i-th one – that is why those two messages are quite long.
3. His own SMS has been appended together, but possibly a great many redundancy characters appear leftwards and rightwards due to the broken computer.
E.g: if his SMS is “motheriloveyou”, either long message printed by that machine, would possibly be one of “hahamotheriloveyou”, “motheriloveyoureally”, “motheriloveyouornot”, “bbbmotheriloveyouaaa”, etc.
4. For these broken issues, the little cat has printed his original text twice (so there appears two very long messages). Even though the original text remains the same in two printed messages, the redundancy characters on both sides would be possibly different.

You are given those two very long messages, and you have to output the length of the longest possible original text written by the little cat.

Background:
The SMS in Byterland mobile service are charging in dollars-per-byte. That is why the little cat is worrying about how long could the longest original text be.

Why ask you to write a program? There are four resions:
1. The little cat is so busy these days with physics lessons;
2. The little cat wants to keep what he said to his mother seceret;
3. POJ is such a great Online Judge;
4. The little cat wants to earn some money from POJ, and try to persuade his mother to see the doctor :(

Input

Two strings with lowercase letters on two of the input lines individually. Number of characters in each one will never exceed 100000.

Output

A single line with a single integer number – what is the maximum length of the original text written by the little cat.

Sample Input

yeshowmuchiloveyoumydearmotherreallyicannotbelieveit
yeaphowmuchiloveyoumydearmother

Sample Output

这道题要求的是两个串的最长公共连续子串，在求出height数组之后，再把sa数组区分出来，只要其中一个sa数组是属于第一串，另外一个属于第二串，那么我们可以求得其最大值，之所以可以这样做，是因为sa数组已经对字符串按字典序排好序了

#include<cstdio>
#include<cstring>
#include<algorithm> 
using namespace std;
const int MAXN=2e5+10;

int sa[MAXN];//SA数组，表示将S的n个后缀从小到大排序后把排好序的
             //的后缀的开头位置顺次放入SA中
int t1[MAXN],t2[MAXN],c[MAXN];//求SA数组需要的中间变量，不需要赋值
int rank[MAXN],height[MAXN];
//待排序的字符串放在s数组中，从s[0]到s[n-1],长度为n,且最大值小于m,
//除s[n-1]外的所有s[i]都大于0，r[n-1]=0
//函数结束以后结果放在sa数组中
void build_sa(int s[],int n,int m)
{
    int i,j,p,*x=t1,*y=t2;
    //第一轮基数排序，如果s的最大值很大，可改为快速排序
    for(i=0;i<m;i++)c[i]=0;
    for(i=0;i<n;i++)c[x[i]=s[i]]++;
    for(i=1;i<m;i++)c[i]+=c[i-1];
    for(i=n-1;i>=0;i--)sa[--c[x[i]]]=i;
    for(j=1;j<=n;j<<=1)
    {
        p=0;
        //直接利用sa数组排序第二关键字
        for(i=n-j;i<n;i++)y[p++]=i;//后面的j个数第二关键字为空的最小
        for(i=0;i<n;i++)if(sa[i]>=j)y[p++]=sa[i]-j;
        //这样数组y保存的就是按照第二关键字排序的结果
        //基数排序第一关键字
        for(i=0;i<m;i++)c[i]=0;
        for(i=0;i<n;i++)c[x[y[i]]]++;
        for(i=1;i<m;i++)c[i]+=c[i-1];
        for(i=n-1;i>=0;i--)sa[--c[x[y[i]]]]=y[i];
        //根据sa和x数组计算新的x数组
        swap(x,y);
        p=1;x[sa[0]]=0;
        for(i=1;i<n;i++)
            x[sa[i]]=y[sa[i-1]]==y[sa[i]] && y[sa[i-1]+j]==y[sa[i]+j]?p-1:p++;
        if(p>=n)break;
        m=p;//下次基数排序的最大值
    }
}
void getHeight(int s[],int n)
{
    int i,j,k=0;
    for(i=0;i<=n;i++)rank[sa[i]]=i;
    for(i=0;i<n;i++)
    {
        if(k)k--;
        j=sa[rank[i]-1];
        while(s[i+k]==s[j+k])k++;
        height[rank[i]]=k;
    }
}
 
char str[MAXN];
int s[MAXN];
 
int main()
{
    scanf("%s",str);
    int len=strlen(str);
    str[len]='$';
    scanf("%s",str+len+1);
    int n=strlen(str);
    for(int i=0;i<=n;i++) s[i]=str[i];
    build_sa(s,n+1,128);
    getHeight(s,n);
    int ans=0;
    for(int i=2;i<=n-1;i++){
    	if(height[i]>ans){
    		if(sa[i-1]>=0&&sa[i-1]<len&&sa[i]>=len) ans=max(ans,height[i]);
    		if(sa[i]>=0&&sa[i]<len&&sa[i-1]>=len) ans=max(ans,height[i]);
		}
	}
	printf("%d\n",ans);
    return 0;
}