Description
The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days: his mother is getting ill. Being worried about spending so much on railway tickets (Byterland is such a big country, and he has to spend 16 shours on train to his hometown), he decided only to send SMS with his mother.
The little cat lives in an unrich family, so he frequently comes to the mobile service center, to check how much money he has spent on SMS. Yesterday, the computer of service center was broken, and printed two very long messages. The brilliant little cat soon found out:
1. All characters in messages are lowercase Latin letters, without punctuations and spaces.
2. All SMS has been appended to each other – (i+1)-th SMS comes directly after the i-th one – that is why those two messages are quite long.
3. His own SMS has been appended together, but possibly a great many redundancy characters appear leftwards and rightwards due to the broken computer.
E.g: if his SMS is “motheriloveyou”, either long message printed by that machine, would possibly be one of “hahamotheriloveyou”, “motheriloveyoureally”, “motheriloveyouornot”, “bbbmotheriloveyouaaa”, etc.
4. For these broken issues, the little cat has printed his original text twice (so there appears two very long messages). Even though the original text remains the same in two printed messages, the redundancy characters on both sides would be possibly different.
You are given those two very long messages, and you have to output the length of the longest possible original text written by the little cat.
Background:
The SMS in Byterland mobile service are charging in dollars-per-byte. That is why the little cat is worrying about how long could the longest original text be.
Why ask you to write a program? There are four resions:
1. The little cat is so busy these days with physics lessons;
2. The little cat wants to keep what he said to his mother seceret;
3. POJ is such a great Online Judge;
4. The little cat wants to earn some money from POJ, and try to persuade his mother to see the doctor :(
Input
Two strings with lowercase letters on two of the input lines individually. Number of characters in each one will never exceed 100000.
Output
A single line with a single integer number – what is the maximum length of the original text written by the little cat.
Sample Input
yeshowmuchiloveyoumydearmotherreallyicannotbelieveit
yeaphowmuchiloveyoumydearmother
Sample Output
27
呵呵逼逼了这么久都没什么好的啦
给定两个字符串A和B,求最长公共子串。
算法分析:
字符串的任何一个子串都是这个字符串的某个后缀的前缀。求A和B的最长公共子串等价于求A的后缀和B的后缀的最长公共前缀的最大值。如果枚举A和B的所有的后缀,那么这样做显然效率低下。由于要计算A的后缀和B的后缀的最长公共前缀,所以先将第二个字符串写在第一个字符串后面,中间用一个没有出现过的字符隔开,再求这个新的字符串的后缀数组。观察一下,看看能不能从这个新的字符串的后缀数组中找到一些规律。如图8。
那么是不是所有的height值中的最大值就是答案呢?不一定!有可能这两个后缀是在同一个字符串中的,所以实际上只有当suffix(sa[i-1])和suffix(sa[i])不是同一个字符串中的两个后缀时,height[i]才是满足条件的。而这其中的最大值就是答案。记字符串A和字符串B的长度分别为|A|和|B|。求新的字符串的后缀数组和height数组的时间是O(|A|+|B|),然后求排名相邻但原来不在同一个字符串中的两个后缀的height值的最大值,时间也是O(|A|+|B|),所以整个做法的时间复杂度为O(|A|+|B|)。时间复杂度已经取到下限,由此看出,这是一个非常优秀的算法。
很开心还是用后缀数组,只是要求一下两个的前缀有没有跨越两个串。
#include<cstdio>
#include<cstring>
#include<algorithm>
#include<cmath>
#include<iostream>
#include<cstdlib>
using namespace std;
char a[210000],b[210000];
int wr[210000],mc[210000],JS[210000],sa[210000],y[210000],height[210000];
bool cmp(int k1,int k2,int ln){
return wr[k1]==wr[k2]&&wr[k1+ln]==wr[k2+ln];
}
void get_sa(int n,int m){//构建SA后缀数组
int i,k,p,ln;
//memcpy(mc,a,sizeof(a));
for(i=1;i<=n;i++)mc[i]=a[i];
//a数组:原字符串,mc名次数组
for(i=0;i<=m;i++)JS[i]=0;
for(i=1;i<=n;i++)JS[mc[i]]++;
for(i=1;i<=m;i++)JS[i]+=JS[i-1];
for(i=n;i>=1;i--)sa[JS[mc[i]]--]=i;
//以上四句为基数排序,不懂的看flash
ln=1;p=0;
//ln为当前子串的长度,p表示有多少不相同的子串
while(p<n){
for(k=0,i=n-ln+1;i<=n;i++)y[++k]=i;
for(i=1;i<=n;i++)if(sa[i]-ln>0)y[++k]=sa[i]-ln;
for(i=1;i<=n;i++)wr[i]=mc[y[i]];
//数组y保存的是对第二关键字排序的结果 。
//数组wr保存的是对第二关键字排序后的mc值
//以下为对第一关键字排序
for(i=0;i<=m;i++)JS[i]=0;
for(i=1;i<=n;i++)JS[wr[i]]++;
for(i=1;i<=m;i++)JS[i]+=JS[i-1];
for(i=n;i>=1;i--)sa[JS[wr[i]]--]=y[i];
memcpy(wr,mc,sizeof(wr));
p=1;mc[sa[1]]=1;
for(i=2;i<=n;i++){
if(!cmp(sa[i],sa[i-1],ln))p++;
mc[sa[i]]=p;
}
//得到新的mc数组
m=p;ln*=2;
}
a[0]=0;sa[0]=0;
}
void get_he(int n){
int i,j,k=0;
for(i=1;i<=n;i++){
j=sa[mc[i]-1];
if(k)k--;
while(a[j+k]==a[i+k]) k++;
height[mc[i]]=k;
}
}
int main(){
scanf("%s%s",a+1,b+1);int n1=strlen(a+1),n2=strlen(b+1);
for(int i=1;i<=n2;i++)a[n1+i]=b[i];
int n=n1+n2;
get_sa(n,256);
get_he(n);
int ans=0;
for(int i=1;i<=n;i++){
if((i+height[i]-1<=n1&&sa[mc[i]-1]>n1)||(i>n1&&sa[mc[i]-1]+height[i]-1<=n1)){
if(height[mc[i]]>ans)ans=height[mc[i]];
}
}
printf("%d\n",ans);
return 0;
}