（HDU）3336 - Count the string 【KMP】+【DP】

最新推荐文章于 2021-08-03 15:02:13 发布

原创最新推荐文章于 2021-08-03 15:02:13 发布 · 588 阅读

0 ·

CC 4.0 BY-SA版权

★HDOJ(杭电OJ) 同时被 3 个专栏收录

19 篇文章

订阅专栏

☆8.字符串算法

12 篇文章

订阅专栏

----KMP算法

6 篇文章

订阅专栏

本文介绍了一种高效的字符串匹配算法，通过KMP预处理和DP状态转移方程，避免了传统方法中的重复计算，显著提升了不同长度前缀匹配次数的计算效率。

Count the string

Time Limit : 2000/1000ms (Java/Other) Memory Limit : 32768/32768K (Java/Other)

Total Submission(s) : 9 Accepted Submission(s) : 8

Problem Description

It is well known that AekdyCoin is good at string problems as well as number theory problems. When given a string s, we can write down all the non-empty prefixes of this string. For example:
s: "abab"
The prefixes are: "a", "ab", "aba", "abab"
For each prefix, we can count the times it matches in s. So we can see that prefix "a" matches twice, "ab" matches twice too, "aba" matches once, and "abab" matches once. Now you are asked to calculate the sum of the match times for all the prefixes. For "abab", it is 2 + 2 + 1 + 1 = 6.
The answer may be very large, so output the answer mod 10007.

Input

The first line is a single integer T, indicating the number of test cases. For each case, the first line is an integer n (1 <= n <= 200000), which is the length of string s. A line follows giving the string s. The characters in the strings are all lower-case letters.

Output

For each case, output only one number: the sum of the match times for all the prefixes of s mod 10007.

Sample Input

1
4
abab

Sample Output

Author

foreverlin@HNU

题意很简单：不同长度[1...len]前缀最大重复次数和累加。mod 10007

一个自然的写法，枚举par串（结果是TLE，看来数据比较多）

#include <bits/stdc++.h>

using namespace std;

const int MAXN=200010;
char ori[MAXN],par[MAXN];
int nex[MAXN];

void preKMP(char s[])
{
    int p=0,q=-1; nex[0]=-1;
    while(s[p]){
        while(q!=-1 && s[p]!=s[q]) q=nex[q];
        nex[++p]=++q;
    }
}

int KMP(char a[],char b[])
{
    int p=0,q=0,l=strlen(a),ans=0;
    preKMP(a);
    while(b[p]){
        while(q!=-1 && a[q]!=b[p]) q=nex[q];
        ++p; ++q;
        if(q==l)
        {
            ans++;
            q=nex[q];
        }
    }
    return ans;
}

int main()
{
    int t,len;
    for(scanf("%d",&t);t;t--)
    {
        scanf("%d",&len);
        scanf("%s",ori);
        int res=0;
        for(int i=0;i<len;i++)
        {
            par[i]=ori[i];
            par[i+1]='\0';
            res+=KMP(par,ori);
            res%=10007;
        }
        printf("%d\n",res);
    }
    return 0;
}

根据next数组的性质，next[i]存放的是第i位可以移动的最大匹配。（这个值肯定是小于i的）

就如同推出next数组一样，从1开始，设dp[i]表示以str[i]结尾的字符串含有的前缀数量，

dp[i] = dp[next[i]] + 1，即长度小于i的前缀数量+长度为i的前缀（1）。

为什么是+1不是加别的，因为如果还存在一个重复子串，在next[i]位置到i位置之间，那么next[i]还会更大（矛盾）。

next数组的性质确保它是首尾端子串的最大匹配，所以后面不存在重复子串，除了长度为i的串本事。

回顾一下KMP和DP：

kmp思想：对字符串进行预处理，记录与当前位置i后缀相同的“最近”位置,用next[i]记录,
       　　　　保证 s[1 .. i] 中 s[i - next[i] + 1 .. i] 与 s[1 .. next[i]] 是相同的,
   　　　　　　以便在某处字符不匹配时，不用重新从头判断一遍，只要从对应的next[i]即可,
  　　　　　　  因为中间有部分与自身重叠，减少了不必要的判断，实现见代码。

  　　　　　　  例1：
        　　　　　　　　　　i :    1234567
      　　　　　　　   字符串 :    abcabab
             　　　　　next[i] :    0001212
   　　　　　　例2：
      　　　　　　　　　　  i :    12345678
      　　　　　　　　字符串 :    abababab
           　　　　　　 next[i] :    00123456

    　　　　　　例3：
       　　　　　　　　　　 i :     123456789
      　　　　　　　　字符串:     ababcdabd
          　　　　　　  next[i]:     001200120
  　　DP:
  　　　　　　dp[]数组：以 i 结尾的串中所有前缀的计数和
        　　　　状态转移方程： dp[i] = dp[next[i]] + 1;

     　　　　　　　　 i :     123456
    　　　　　　  字符串:    ababab
      　　　　　　dp[i]:    112233

　　　　比如当i等于5时，ababa中后面的aba与前面的aba重叠，　　

　　　　　　以第三个a为结尾的前缀总数对应于前面的aba的前缀数+1（ababa）

#include <bits/stdc++.h>

using namespace std;

const int MAXN=200010;
char ori[MAXN];
int nex[MAXN],dp[MAXN];

void preKMP(char s[])
{
    int p=0,q=-1; nex[0]=-1;
    while(s[p]){
        while(q!=-1 && s[p]!=s[q]) q=nex[q];
        nex[++p]=++q;
    }
}

int main()
{
    int t,len;
    for(scanf("%d",&t);t;t--)
    {
        scanf("%d",&len);
        scanf("%s",ori);
        int ans=0;
        preKMP(ori);
        memset(dp,0,sizeof(dp));
        for(int i=1;i<=len;i++)
        {
            dp[i]=dp[nex[i]]+1;
            ans+=dp[i]%10007;
        }
        printf("%d\n",ans%10007);
    }
    return 0;
}