【CodeForces】528D Fuzzy Search

本文介绍了一种在允许一定错误阈值的情况下,寻找一个较短基因序列在较长基因序列中出现的所有位置的方法。通过将问题转化为针对特定字符集的模糊字符串匹配问题,并使用高效的傅立叶变换技术来加速计算过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

题目描述

Leonid works for a small and promising start-up that works on decoding the human genome. His duties include solving complex problems of finding certain patterns in long strings consisting of letters ‘A’, ‘T’, ‘G’ and ‘C’.
Let’s consider the following scenario. There is a fragment of a human DNA chain, recorded as a string S. To analyze the fragment, you need to find all occurrences of string T in a string S. However, the matter is complicated by the fact that the original chain fragment could contain minor mutations, which, however, complicate the task of finding a fragment. Leonid proposed the following approach to solve this problem.
Let’s write down integer k ≥ 0 — the error threshold. We will say that string T occurs in string S on position i (1 ≤ i ≤ |S| - |T| + 1), if after putting string T along with this position, each character of string T corresponds to the some character of the same value in string S at the distance of at most k. More formally, for any j (1 ≤ j ≤ |T|) there must exist such p (1 ≤ p ≤ |S|), that |(i + j - 1) - p| ≤ k and S[p] = T[j].
For example, corresponding to the given definition, string “ACAT” occurs in string “AGCAATTCAT” in positions 2, 3 and 6.
这里写图片描述
Note that at k=0 the given definition transforms to a simple definition of the occurrence of a string in a string.
Help Leonid by calculating in how many positions the given string T occurs in the given string S with the given error threshold.

题目大意

给定两个串 S、T,求T在S的哪些位置“匹配”。S[i]与T[j]“匹配”指存在S[x]==T[j] (j-k≤x≤j+k) 。

输入格式

The first line contains three integers |S|,|T|,k (1 ≤|T| ≤|S|≤ 200 000, 0≤k ≤200000) — the lengths of strings S and T and the error threshold.
The second line contains string S.
The third line contains string T.
Both strings consist only of uppercase letters ‘A’, ‘T’, ‘G’ and ‘C’.

输出格式

Print a single number — the number of occurrences of T in S with the error threshold k by the given definition.

样例输入

10 4 1
AGCAATTCAT
ACAT

样例输出

3


题解

考虑到只有ACGT4种字母,我们可以分开讨论。
似乎可以合在一起讨论。


代码

#include <cstdio>
#include <cmath>
#include <complex>
#include <algorithm>
#include <iostream>
#define db double
#define cd complex<db>
using namespace std;
const int Q=1048576;
const double pi=3.1415926535897932;
cd w[Q],ti[Q];
int n;
void fly(cd a[],int flag)
{
    int i,j,l,now;
    for(i=j=0;i<n;i++){
        if(i<j)swap(a[i],a[j]);
        for(l=(n>>1);(j^=l)<l;l>>=1);
    }
    w[0]=1;
    for(now=1;now<n;now<<=1)
    {
        cd ha=exp(cd(0,pi*(db)flag/(db)now));
        for(i=1;i<now;i++)w[i]=w[i-1]*ha;
        for(j=0;j<n;j+=(now<<1))
            for(l=0;l<now;l++)
            {
                cd p=a[j+l],q=a[now+j+l]*w[l];
                a[j+l]=p+q,a[j+l+now]=p-q;
            }
    }
    if(flag==1)return;
    cd temp=1.0/(db)n;
    for(i=0;i<n;i++)a[i]*=temp;
}
void FFT(cd a[],cd b[])
{
    for(int i=0;i<n;i++)ti[i]=b[i];
    fly(a,1),fly(ti,1);
    for(int i=0;i<n;i++)a[i]*=ti[i];
    fly(a,-1);
}
int gg(cd x)
{return (int)floor(x.real()+0.5);}
cd a[5][Q],b[5][Q],ano[Q];
int main()
{
    char o;
    int ans=0,i,c,d,k,j;
    scanf("%d%d%d",&c,&d,&k);
    for(n=1;n<=max(c+2*k,c+d);n<<=1);
    for(i=0;i<n;i++)
        for(j=1;j<=4;j++)
            a[j][i]=b[j][i]=0;
    for(i=1;i<=c;i++)
        while(true){
            o=getchar();
            if(o=='A'){
                a[1][i]=1;
                break;
            }
            if(o=='T'){
                a[2][i]=1;
                break;
            }
            if(o=='G'){
                a[3][i]=1;
                break;
            }
            if(o=='C'){
                a[4][i]=1;
                break;
            }
        }
    for(i=1;i<=d;i++)
        while(true){
            o=getchar();
            if(o=='A'){
                b[1][d-i+1]=1;
                break;
            }
            if(o=='T'){
                b[2][d-i+1]=1;
                break;
            }
            if(o=='G'){
                b[3][d-i+1]=1;
                break;
            }
            if(o=='C'){
                b[4][d-i+1]=1;
                break;
            }
        }
    for(i=0;i<=2*k;i++)ano[i]=1;
    for(i=2*k+1;i<n;i++)ano[i]=0;
    for(i=1;i<=4;i++){
        FFT(a[i],ano);
        for(j=1;j<=c;j++)
            if(gg(a[i][j+k])>0)a[i][j]=1;
            else a[i][j]=0;
        for(j=c+1;j<n;j++)a[i][j]=0;
        FFT(a[i],b[i]);
    }
    for(i=1;i+d<=c+1;i++)
        if(gg(a[1][i+d])+gg(a[2][i+d])+gg(a[3][i+d])+gg(a[4][i+d])==d)ans++;
    printf("%d",ans);
    return 0;
}
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值