CUGBACM21级暑假专题训练#6 字符串_acm 双hash模数选择-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_62599885/article/details/131495024

文章介绍了几种字符串处理的算法，包括计算能从花布条中剪出的小饰条数量的剪切问题，Manacher算法求解最长回文串，字符串哈希计算不同字符串个数，KMP字符串匹配算法，以及AC自动机的基本应用。这些算法在解决字符串相关问题时具有重要作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

训练链接

A - 剪花布条

一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？

Input

输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。

Output

输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。

Sample

Inputcopy	Outputcopy
abcde a3 aaaaaa aa #	0 3

#include<iostream>
#include<cstring>
#include<string>
using namespace std;
const int N =  1e5;
string lch,ch;
int n, m;



int nextv[N], nextval[N];

void getnext()
{
	int k = -1,pos=0;
	nextv[0] = -1;
	while (pos < m-1)
	{
		if (k == -1 || ch[k] == ch[pos])
		{
		//	nextv[pos] = k;
			k++;
			pos++;
			nextv[pos] = k;
		}
		else
		{
			k = nextv[k];
		}
	}
}




void getans()
{
	int i = 0, j = 0,ans=0;
	while (i < n )
	{
		if (j == -1 || lch[i] == ch[j])
		{
			
			i++;
			j++;

		}
		else
		{
			j = nextv[j];
		}
		if (j == m)
		{
		//	cout << i - j +1<< endl;
		//	i = i - j + 1;
			j = nextv[j];
			ans++;
			//i = i - j + 1;
		}
	}
	cout<<ans<<endl;
}


void printnxt()
{
	for (int i = 0; i < m; i++)
	{
		cout << nextv[i]+1<<" ";
	}
}
int main()
{
	
	while(cin >> lch&&lch!="#")
	{
	memset(nextv,0,sizeof nextv);
	cin>>ch;
	//cout << lch << endl;
	//cout << ch << endl;
	n = lch.length();
	m = ch.length();
	getnext();

	getans();
}
//	printnxt();
}

C - manacher 算法

Description

给出一个只由小写英文字符 a,b,c,…y,za,b,c,…y,z 组成的字符串 �S ,求 �S 中最长回文串的长度。

字符串长度为 �n。

Input

一行小写英文字符 a,b,c,⋯,y,za,b,c,⋯,y,z 组成的字符串 �S。

Output

一个整数表示答案。

Sample 1

Inputcopy	Outputcopy
aaa	3

Hint

1≤�≤1.1×1071≤n≤1.1×107。

#include<iostream>
#include<cstring>

using namespace std;
const int N =  11e6+5;
char str[N<<1];
int p[N<<1];
int n, m;


void printstr(int n)
{
	for (int i = 1; i <= n; i++)
	{
		cout << str[i] << " ";
	}
}

int main()
{
	string s1;
	cin >> s1;
	int le = s1.length();
	for (int i = 1; i <= 2*le; i++)
	{
		str[i] = '#';
		i++;
		str[i ] = s1[i/2 - 1];

	}
	int n = 2 * le + 1;
	str[2 * le + 1] = '#';
	str[0] = '@';
	//printstr(2 * le + 1);

	int mx=0, id=1, pmx=0, pos=1;
	p[1] = 1;
	for (int i = 1; i <= n; i++)
	{
		if (mx < i)
		{
			p[i] = 1;
		}
		else
		{
			p[i] = min(p[2 * id - i], mx - i);
		}
		while (i-p[i]>=0&&str[i - p[i]] == str[i + p[i]]) p[i]++;
		
		if (p[i]+i > mx)
		{
			mx = p[i] + i;
			id = i;
		}
		if (p[i] > pmx)
		{
			pmx = p[i];
			pos = i;
		}
	}

	cout << pmx-1;


}

D - 字符串哈希

Description

如题，给定 �N 个字符串（第 �i 个字符串长度为 ��Mi，字符串内包含数字、大小写字母，大小写敏感），请求出 �N 个字符串中共有多少个不同的字符串。

友情提醒：如果真的想好好练习哈希的话，请自觉。

Input

第一行包含一个整数 �N，为字符串的个数。

接下来 �N 行每行包含一个字符串，为所提供的字符串。

Output

输出包含一行，包含一个整数，为不同的字符串个数。

Sample 1

Inputcopy	Outputcopy
5 abc aaaa abc abcc 12345	4

Hint

对于 30%30% 的数据：�≤10N≤10，��≈6Mi≈6，��≤15Mmax≤15。

对于 70%70% 的数据：�≤1000N≤1000，��≈100Mi≈100，��≤150Mmax≤150。

对于 100%100% 的数据：�≤10000N≤10000，��≈1000Mi≈1000，��≤1500Mmax≤1500。

据我的理解，Hash就是一个像函数一样的东西，你放进去一个值，它给你输出来一个值。输出的值就是Hash值。一般Hash值会比原来的值更好储存(更小)或比较。

那字符串Hash就非常好理解了。就是把字符串转换成一个整数的函数。而且要尽量做到使字符串对应唯一的Hash值。

字符串Hash的种类还是有很多种的，不过在信息学竞赛中只会用到一种名为“BKDR Hash”的字符串Hash算法。

它的主要思路是选取恰当的进制，可以把字符串中的字符看成一个大数字中的每一位数字，不过比较字符串和比较大数字的复杂度并没有什么区别(高精数的比较也是�(�)O(n)的)，但只要把它对一个数取模，然后认为取模后的结果相等原数就相等，那么就可以在一定的错误率的基础上�(1)O(1)进行判断了。

那么我们选择什么进制比较好？

首先不要把任意字符对应到数字0，比如假如把a对应到数字0，那么将不能只从Hash结果上区分ab和b（虽然可以额外判断字符串长度，但不把任意字符对应到数字0更加省事且没有任何副作用），一般而言，把a-z对应到数字1-26比较合适。

关于进制的选择实际上非常自由，大于所有字符对应的数字的最大值，不要含有模数的质因子(那还模什么)，比如一个字符集是a到z的题目，选择27、233、19260817 都是可以的。

模数的选择（尽量还是要选择质数）：

绝大多数情况下，不要选择一个109109级别的数，因为这样随机数据都会有Hash冲突，根据生日悖论，随便找上109109个串就有大概率出现至少一对Hash 值相等的串（参见BZOJ 3098 Hash Killer II）。

最稳妥的办法是选择两个109109级别的质数，只有模这两个数都相等才判断相等，但常数略大，代码相对难写，目前暂时没有办法卡掉这种写法（除了卡时间让它超时）（参见BZOJ 3099 Hash Killer III）。

如果能背过或在考场上找出一个10181018级别的质数(Miller-Rabin)，也相对靠谱，主要用于前一种担心会超时，后一种担心被卡。

偷懒的写法就是直接使用unsigned long long，不手动进行取模，它溢出时会自动对264264进行取模，如果出题人比较良心，这种做法也不会被卡，但这个是完全可以卡的，卡的方法参见BZOJ 3097 Hash Killer I。

下面是代码

这是自然溢出hash(100)

#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
typedef unsigned long long ull;
ull base=131;
ull a[10010];
char s[10010];
int n,ans=1;
ull hashs(char s[])
{
    int len=strlen(s);
    ull ans=0;
    for (int i=0;i<len;i++)
        ans=ans*base+(ull)s[i];
    return ans&0x7fffffff;
}
main()
{
    scanf("%d",&n);
    for (int i=1;i<=n;i++)
    {
        scanf("%s",s);
        a[i]=hashs(s);
    }
    sort(a+1,a+n+1);
    for (int i=2;i<=n;i++)
        if (a[i]!=a[i-1])
            ans++;
    printf("%d\n",ans);
}

这是单模数hash(80)

#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
typedef unsigned long long ull;
ull base=131;
ull a[10010];
char s[10010];
int n,ans=1;
ull mod=19260817;
ull hashs(char s[])
{
    int len=strlen(s);
    ull ans=0;
    for (int i=0;i<len;i++)
        ans=(ans*base+(ull)s[i])%mod;
    return ans;
}
main()
{
    scanf("%d",&n);
    for (int i=1;i<=n;i++)
    {
        scanf("%s",s);
        a[i]=hashs(s);
    }
    sort(a+1,a+n+1);
    for (int i=2;i<=n;i++)
        if (a[i]!=a[i-1])
            ans++;
    printf("%d\n",ans);
}

这是双hash(100)

#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
typedef unsigned long long ull;
ull base=131;
struct data
{
    ull x,y;
}a[10010];
char s[10010];
int n,ans=1;
ull mod1=19260817;
ull mod2=19660813;
ull hash1(char s[])
{
    int len=strlen(s);
    ull ans=0;
    for (int i=0;i<len;i++)
        ans=(ans*base+(ull)s[i])%mod1;
    return ans;
}
ull hash2(char s[])
{
    int len=strlen(s);
    ull ans=0;
    for (int i=0;i<len;i++)
        ans=(ans*base+(ull)s[i])%mod2;
    return ans;
}
bool comp(data a,data b)
{
    return a.x<b.x;
}
main()
{
    scanf("%d",&n);
    for (int i=1;i<=n;i++)
    {
        scanf("%s",s);
        a[i].x=hash1(s);
        a[i].y=hash2(s);
    }
    sort(a+1,a+n+1,comp);
    for (int i=2;i<=n;i++)
        if (a[i].x!=a[i-1].x || a[i-1].y!=a[i].y)
            ans++;
    printf("%d\n",ans);
}

这是只用一个10^18质数的hash(100)

#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
typedef unsigned long long ull;
ull base=131;
ull a[10010];
char s[10010];
int n,ans=1;
ull mod=212370440130137957ll;//是质数！！
ull hashs(char s[])
{
    int len=strlen(s);
    ull ans=0;
    for (int i=0;i<len;i++)
        ans=(ans*base+(ull)s[i])%mod;
    return ans;
}
main()
{
    scanf("%d",&n);
    for (int i=1;i<=n;i++)
    {
        scanf("%s",s);
        a[i]=hashs(s);
    }
    sort(a+1,a+n+1);
    for (int i=2;i<=n;i++)
        if (a[i]!=a[i-1])
            ans++;
    printf("%d\n",ans);
}

G - KMP字符串匹配

Description

给出两个字符串 �1s1 和 �2s2，若 �1s1 的区间 [�,�][l,r] 子串与 �2s2 完全相同，则称 �2s2 在 �1s1 中出现了，其出现位置为 �l。
现在请你求出 �2s2 在 �1s1 中所有出现的位置。

定义一个字符串 �s 的 border 为 �s 的一个非 �s 本身的子串 �t，满足 �t 既是 �s 的前缀，又是 �s 的后缀。
对于 �2s2，你还需要求出对于其每个前缀 �′s′ 的最长 border �′t′ 的长度。

Input

第一行为一个字符串，即为 �1s1。
第二行为一个字符串，即为 �2s2。

Output

首先输出若干行，每行一个整数，按从小到大的顺序输出 �2s2 在 �1s1 中出现的位置。
最后一行输出 ∣�2∣∣s2∣ 个整数，第 �i 个整数表示 �2s2 的长度为 �i 的前缀的最长 border 长度。

Sample 1

Inputcopy	Outputcopy
ABABABC ABA	1 3 0 0 1

#include<cstdio>
#include<cstring>
#include<iostream>
#include<algorithm>
using namespace std;
int n,k,len1,len2;
int next1[1000001];
char s1[1000001];
char s2[1000001];
inline void get_next() //求出next数组 
{ //next数组是从 S[0到i-1]前子串 的前缀后缀最大值
    int t1=0,t2;
    next1[0]=t2=-1;
    while(t1<len2) 
        if(t2==-1 || s2[t1]==s2[t2]) //类似于KMP的匹配 
            next1[++t1]=++t2;
        else t2=next1[t2];//失配 
} 


inline void KMP() //KMP 
{ 
    int t1=0,t2=0;//从0位开始匹配 
    while(t1<len1) //临界值 
    { 
        if(t2==-1 || s1[t1]==s2[t2]) //匹配成功，继续 
            t1++,t2++;
        else t2=next1[t2]; //失配 
        if(t2==len2) printf("%d\n",t1-len2+1),t2=next1[t2];//t2==lenn2时，匹配成功；t1-len2+1即为第一个字母的位置 
    } //匹配成功后，t2置为next[t2] 
} 
int main(){ 
    scanf("%s",s1);
    scanf("%s",s2);
    len1=strlen(s1);
    len2=strlen(s2);
    get_next();
    KMP();
    for(int i=1;i<=len2;++i) 
        printf("%d ",next1[i]);//输出next数组 
    return 0;
}

J - AC 自动机（简单版）

Description

给定 �n 个模式串 ��si 和一个文本串 �t，求有多少个不同的模式串在文本串里出现过。
两个模式串不同当且仅当他们编号不同。

Input

第一行是一个整数，表示模式串的个数 �n。
第 22 到第 (�+1)(n+1) 行，每行一个字符串，第 (�+1)(i+1) 行的字符串表示编号为 �i 的模式串 ��si。
最后一行是一个字符串，表示文本串 �t。

Output

输出一行一个整数表示答案。

Sample 1

Inputcopy	Outputcopy
3 a aa aa aaa	3

Sample 2

Inputcopy	Outputcopy
4 a ab ac abc abcd	3

Sample 3

Inputcopy	Outputcopy
2 a aa aa	2

Hint

样例 1 解释

�2s2 与 �3s3 编号（下标）不同，因此各自对答案产生了一次贡献。

样例 2 解释

�1s1，�2s2，�4s4 都在串 abcd 里出现过。

数据规模与约定

对于 50%50% 的数据，保证 �=1n=1。
对于 100%100% 的数据，保证 1≤�≤1061≤n≤106，1≤∣�∣≤1061≤∣t∣≤106，1≤∑�=1�∣��∣≤1061≤i=1∑n∣si∣≤106。��,�si,t 中仅包含小写字母。

#include<iostream>
#include<cstring>
#include<queue>
using namespace std;

const int N = 1e6 + 5;

int trie[N][26];
int fail[N];
int ctword[N], ct = 0;


void insert(string s)
{
	int root = 0, next;
	for (int i = 0; i <s.size(); i++)
	{
		next = s[i] - 'a';
		if (!trie[root][next])
			trie[root][next] = ++ct;
		root = trie[root][next];	
	}
	ctword[root]++;
}

void getFail()
{
	queue<int> q;
	for (int i = 0; i < 26; i++)
	{
		int v = trie[0][i];
		if (v)
		{
			fail[v] = 0;
			q.push(v);
		}
	}

	while (!q.empty())
	{
		int root = q.front();
		q.pop();
		for (int i = 0; i < 26; i++)
		{
			int v = trie[root][i];
			if (v)
			{
				q.push(v);
				fail[v] = trie[fail[root]][i];
			//	if (!fail[v])
			//	{
			//		fail[v] = 1;
			//	}
			}
			else
			{
				trie[root][i] = trie[fail[root]][i];
			}
		}



	}
}


int query(string s)
{
	int ans = 0,pos=0;
	for (int i = 0; i < s.size(); i++)
	{
		pos = trie[pos][s[i] - 'a'];

		for (int j = pos; ctword[j] != -1 && j != 0; j = fail[j])
		{
			ans += ctword[j];
			ctword[j] = -1;
		}
	//	pos = root;
	}

	return ans;

}


int main()
{
	int n;
	cin >> n;
	string s;
	while (n--)
	{
		cin >> s;

		insert(s);
	}
	cin >> s;
	getFail();
	int ans = query(s);
	cout << ans;

}

T - A + B for you again

Generally speaking, there are a lot of problems about strings processing. Now you encounter another such problem. If you get two strings, such as “asdf” and “sdfg”, the result of the addition between them is “asdfg”, for “sdf” is the tail substring of “asdf” and the head substring of the “sdfg” . However, the result comes as “asdfghjk”, when you have to add “asdf” and “ghjk” and guarantee the shortest string first, then the minimum lexicographic second, the same rules for other additions.

Input

For each case, there are two strings (the chars selected just form ‘a’ to ‘z’) for you, and each length of theirs won’t exceed 10^5 and won’t be empty.

Output

Print the ultimate string by the book.

Sample

Inputcopy	Outputcopy
asdf sdfg asdf ghjk	asdfg asdfghjk

Inputcopy

Outputcopy

asdf sdfg asdf ghjk

asdfg asdfghjk

思路：求最长公共前后缀

#include<stdio.h>
#include<string.h>
#include<iostream>
using namespace std;
int nxt[100010];
char s1[100010],s2[100010];
void getnext(char *str,int m)//对s2进行预处理
{
	m=strlen(str);
	int k=0,i=1;
	nxt[0]=0;
	while(i<m)
	{
		if(str[i]==str[k])
		nxt[i++]=++k;
		else
		{
			if(k==0)
			nxt[i++]=0;
			else
			k=nxt[k-1];
		}
	}
}
int kmp(char *str1,char *str2,int n,int m)//求最长公共前缀和后缀
{
	n=strlen(str1);
	m=strlen(str2);
	getnext(str2,m);
	int i=0,j=0;
	while(i<n&&j<m)
	{
		if(str1[i]==str2[j])
		{
			i++;
			j++;
		}
		else
		{
			if(j==0)
			i++;
			else
			j=nxt[j-1];
		}
	}
	if(j==m&&i==n-1||i==n)//s2字符串找完了，或找到s1数组的结尾了
	return j;
	return 0;
}
int main()
{
	while(~scanf("%s%s",s1,s2))
	{
		int len1=strlen(s1),len2=strlen(s2),k1,k2;
		k1=kmp(s1,s2,len1,len2);//求s1前缀和s2后缀相同的部分
		k2=kmp(s2,s1,len2,len1);//求s2前缀和s1后缀相同的部分
		if(k1>k2)
		printf("%s%s\n",s1,s2+k1);
		else if(k2>k1)
		printf("%s%s\n",s2,s1+k2);
		else//若相同，按字典序较小的顺序输出
		{
			if(strcmp(s1,s2)<0)
			printf("%s%s\n",s1,s2+k1);
			else
			printf("%s%s\n",s2,s1+k2);
		}
	}
	return 0;
}