bzoj3940 [Usaco2015 Feb]Censoring AC自动机+栈

最新推荐文章于 2020-08-27 01:33:26 发布

原创最新推荐文章于 2020-08-27 01:33:26 发布 · 305 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#bzoj

c++ 同时被 2 个专栏收录

1072 篇文章

订阅专栏

ac自动机

7 篇文章

订阅专栏

本文介绍了一种使用AC自动机进行文本过滤的方法，通过构建自动机实现高效匹配和删除敏感词，确保文本内容的安全性和合规性。

Description

Farmer John has purchased a subscription to Good Hooveskeeping magazine for his cows, so they have plenty
of material to read while waiting around in the barn during milking sessions. Unfortunately, the latest
issue contains a rather inappropriate article on how to cook the perfect steak, which FJ would rather his
cows not see (clearly, the magazine is in need of better editorial oversight).
FJ has taken all of the text from the magazine to create the string S of length at most 10^5 characters.
He has a list of censored words t_1 … t_N that he wishes to delete from S. To do so Farmer John finds
the earliest occurrence of a censored word in S (having the earliest start index) and removes that instance
of the word from S. He then repeats the process again, deleting the earliest occurrence of a censored word
from S, repeating until there are no more occurrences of censored words in S. Note that the deletion of one
censored word might create a new occurrence of a censored word that didn’t exist before.
Farmer John notes that the censored words have the property that no censored word appears as a substring of
another censored word. In particular this means the censored word with earliest index in S is uniquely
defined.Please help FJ determine the final contents of S after censoring is complete.
FJ把杂志上所有的文章摘抄了下来并把它变成了一个长度不超过10^5的字符串S。他有一个包含n个单词的列表，列表里的n个单词
记为t_1…t_N。他希望从S中删除这些单词。
FJ每次在S中找到最早出现的列表中的单词(最早出现指该单词的开始位置最小)，然后从S中删除这个单词。他重复这个操作直到S中
没有列表里的单词为止。注意删除一个单词后可能会导致S中出现另一个列表中的单词
FJ注意到列表中的单词不会出现一个单词是另一个单词子串的情况，这意味着每个列表中的单词在S中出现的开始位置是互不相同的
请帮助FJ完成这些操作并输出最后的S
len(S)<=10^5

Solution

容易想到建ac自动机后跑匹配就可以了，至于删除的问题可以记录每一位匹配到哪一个节点（写完才发现这是个栈
如果匹配的时候跳fail会T，因此需要预处理每个节点往后匹配会走到哪个节点

嘴巴AC后就开始漫长的调试之路，写了大概1h才发现下载的数据被我玩坏了，再搞一个测一测发现又是对的了╮(╯▽╰)╭
实测跑得飞快

Code

#include <stdio.h>
#include <string.h>
#include <queue>
#define rep(i,st,ed) for (int i=st;i<=ed;++i)

const int N=1000005;

std:: queue <int> que;

int rec[N][26],tot;
int pos[N],ans[N];
int fail[N],cnt[N];

char str[N],ptr[N];

void ins(char *str) {
    int len=strlen(str+1),now=0;
    rep(i,1,len) {
        int ch=str[i]-'a';
        if (!rec[now][ch]) rec[now][ch]=++tot;
        now=rec[now][ch];
    }
    cnt[now]=len;
}

void get_fail() {
    for (;!que.empty();) que.pop();
    rep(i,0,25) if (rec[0][i]) que.push(rec[0][i]);
    for (;!que.empty();) {
        int now=que.front(); que.pop();
        rep(i,0,25) {
            if (rec[now][i]) {
                fail[rec[now][i]]=rec[fail[now]][i];
                que.push(rec[now][i]);
            } else rec[now][i]=rec[fail[now]][i];
        }
    }
}

void solve(char *str) {
    int len=strlen(str+1),now=0;
    rep(i,1,len) {
        int ch=str[i]-'a'; ans[++ans[0]]=ch;
        now=rec[now][ch];
        pos[ans[0]]=now;
        if (cnt[now]) {
            ans[0]-=cnt[now];
            now=pos[ans[0]];
        }
    }
    rep(i,1,ans[0]) printf("%c", ans[i]+'a');
}

int main(void) {
    freopen("data.in","r",stdin);
    freopen("myp.out","w",stdout);
    scanf("%s",str+1);
    int n; scanf("%d",&n);
    rep(i,1,n) {
        scanf("%s",ptr+1);
        ins(ptr);
    } get_fail();
    solve(str);
    return 0;
}