后缀数组---Milk Patterns

本文介绍了解决POJ 3261问题的方法,该问题要求找出一个序列中最长的重复子串,该子串至少出现K次。通过使用后缀数组和高度数组进行优化搜索,实现高效查找。

POJ  3261

Description

Farmer John has noticed that the quality of milk given by his cows varies from day to day. On further investigation, he discovered that although he can't predict the quality of milk from one day to the next, there are some regular patterns in the daily milk quality.

To perform a rigorous study, he has invented a complex classification scheme by which each milk sample is recorded as an integer between 0 and 1,000,000 inclusive, and has recorded data from a single cow over N (1 ≤ N ≤ 20,000) days. He wishes to find the longest pattern of samples which repeats identically at least K (2 ≤ K ≤ N) times. This may include overlapping patterns -- 1 2 3 2 3 2 3 1 repeats 2 3 2 3 twice, for example.

Help Farmer John by finding the longest repeating subsequence in the sequence of samples. It is guaranteed that at least one subsequence is repeated at least K times.

Input

Line 1: Two space-separated integers:  N and  K
Lines 2..  N+1:  N integers, one per line, the quality of the milk on day  i appears on the  ith line.

Output

Line 1: One integer, the length of the longest pattern which occurs at least  K times

Sample Input

8 2
1
2
3
2
3
2
3
1

Sample Output

4

题意: 给了N和K,接下来有N个数输入,N<=20000,每个数小于1000,000,求一个最长的子串,这个子串在这个串中至少出现K次,K>=2,保证至少存在一个串符合;

思路:我们可以通过二分子串的长度len来做,这时就将题目变成了是否存在重复次数至少为K次且长度不小len的字符串。首先我们可以把相邻的所有不小于len的height[]看成一组,这组内有多少个字符串,就相当于有多少个长度至少为len的重复的子串。之所以可以这么做,是因为排名第i的字符串和排名第j的字符串的最长公共前缀等于height[i],height[i+1],...,height[j]中的最小值,所以把所有不小于len的height[]看成一组就保证了组内任意两个字符串的最长公共前缀都至少为k,且长度为k的前缀是每个字符串共有的,因此这组内有多少个字符串,就相当于有多少个长度至少为k的重复的子串(任意一个子串都是某个后缀的前缀);
#include <iostream>
#include <algorithm>
#include <cstdio>
#include <cstring>
#include <map>
#define rep(i,n) for(int i = 0;i < n; i++)
using namespace std;
const int size=200205,INF=1<<30;
int rk[size],sa[size],height[size],w[size],wa[size],res[size];
int N,K;
void getSa (int len,int up) {
    int *k = rk,*id = height,*r = res, *cnt = wa;
    rep(i,up) cnt[i] = 0;
    rep(i,len) cnt[k[i] = w[i]]++;
    rep(i,up) cnt[i+1] += cnt[i];
    for(int i = len - 1; i >= 0; i--) {
        sa[--cnt[k[i]]] = i;
    }
    int d = 1,p = 0;
    while(p < len){
        for(int i = len - d; i < len; i++) id[p++] = i;
        rep(i,len)  if(sa[i] >= d) id[p++] = sa[i] - d;
        rep(i,len) r[i] = k[id[i]];
        rep(i,up) cnt[i] = 0;
        rep(i,len) cnt[r[i]]++;
        rep(i,up) cnt[i+1] += cnt[i];
        for(int i = len - 1; i >= 0; i--) {
            sa[--cnt[r[i]]] = id[i];
        }
        swap(k,r);
        p = 0;
        k[sa[0]] = p++;
        rep(i,len-1) {
            if(sa[i]+d < len && sa[i+1]+d <len &&r[sa[i]] == r[sa[i+1]]&& r[sa[i]+d] == r[sa[i+1]+d])
                k[sa[i+1]] = p - 1;
            else k[sa[i+1]] = p++;
        }
        if(p >= len) return ;
        d *= 2,up = p, p = 0;
    }
}

void getHeight(int len) {
    rep(i,len) rk[sa[i]] = i;
    height[0] =  0;
    for(int i = 0,p = 0; i < len - 1; i++) {
        int j = sa[rk[i]-1];
        while(i+p < len&& j+p < len&& w[i+p] == w[j+p]) {
            p++;
        }
        height[rk[i]] = p;
        p = max(0,p - 1);
    }
}

int getSuffix(int s[]) {
    int len =N,up = 0;
    for(int i = 0; i < len; i++) {
        w[i] = s[i];
        up = max(up,w[i]);
    }
    w[len++] = 0;
    getSa(len,up+1);
    getHeight(len);
    return len;
}
void solve()///二分;
{
    int i,j,k,cnt,ans,mid,min,max;
    min=1,max=N;
    for(;;)
    {
        mid = (max + min) / 2;
        if(mid==min)
            break;
        ans=cnt=0;
        for(i=1;i<=N;i++)
        {///计算连续的height[];
            if(height[i]<mid)
            {
                if(cnt>ans)
                ans=cnt;
                cnt=0;
            }
            else
            {
                if(!cnt)
                    cnt=2;
                else
                    ++cnt;
            }
        }
        if(cnt > ans)
            ans = cnt;
        if(ans >= K)
            min = mid;
        else
            max = mid;
    }
    printf("%d\n", mid);
}
map<int,int>q;
int main()
{
    int s[size],a[size];
    while(scanf("%d%d",&N,&K)!=EOF)
    {
        for(int i=0;i<N;i++)
        {
            scanf("%d",&s[i]);
            a[i]=s[i];
        }
        sort(a,a+N);
        int pre=1,tot=1;///离散化处理;
        for(int i=0;i<N;i++)
        {
            if(a[i]==pre)
            {
                a[i]=tot;
                q[pre]=tot;
            }
            else
            {
                pre=a[i];
                a[i]=++tot;
                q[pre]=tot;
            }
        }
        for(int i=0;i<N;i++)
        {
            s[i]=q[s[i]];
        }
        getSuffix(s);
        solve();
    }
}

 

转载于:https://www.cnblogs.com/chen9510/p/5471395.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值