算典03_习题_07_UVA-1368-优快云博客

本文链接：https://blog.youkuaiyun.com/a27038/article/details/55516787

本文介绍了一种寻找一组DNA序列中最接近所有序列的共识序列的方法。通过对每组DNA序列进行统计，确定每个位置上出现最多的碱基，并计算该共识序列与原始序列之间的差异。

DNA Consensus String

题意

给出一组DNA序列(即一些字符串),找出与每个DNA序列的差最小的DNA序列
差的意思是序列中位置相同但字符不同的位置的个数
输出这个DNA序列以及最小的差

题解

1.准备

这里有一个小技巧，DNA只有“ATCG”四种，要统计这四种出现的次数，就需要一种对应关系，如让ATCG分别对应0123，那么我就可能用a[0]表示A的出现次数

char N[] = {"ACGT"};
int Nid(char c){ return strchr(N, c) - N; } //N[Nid(c)];

我这里还是一如既往地使用常量表来很简便地建立这种对应关系，然后用一个查找函数，使其可以通过字符找到下标，如上面的代码，N[Nid(“A”)]就可以表示A的次数，这样就方便多了

2.实现

有了上一步的准备，代码就很显而易见了，这里我对所有的DNA一起统计某一位，这一位上出现最多的字符就是结果在这一位的字符，而差就用总的DNA数减去其出现的次数即可(每一位都这样处理)

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn = 1e3 +5;
#define met(a,b) memset(a, b, sizeof(a));

char N[] = {"ACGT"};
int Nid(char c){ return strchr(N, c) - N; } //N[Nid(c)];

int m, n, res;
char s[maxn][maxn];
int ans[maxn][4];

char solve(int id){
    int Max = 0, ID = 0;
    for(int i = 0; i < 4; ++i){
        if(Max < ans[id][i]) {
            Max = ans[id][i];
            ID = i;
        }
    }
    res += n - Max;
    return N[ID];

}

int main(){
    #ifdef _LOCAL
    freopen("in.txt","r", stdin);
    #endif // _LOCAL

    int t; cin >> t;
    while(t--) {
        cin >> n >> m;
        for(int i = 0; i < n; ++i) {
            scanf("%s", s[i]);
        }

        met(ans, 0);
        for(int i = 0; i < m; ++i){
            for(int j = 0; j < n; ++j){
                ++ans[i][Nid(s[j][i])];
            }
        }

        res = 0;
        for(int i = 0; i < m; ++i){
            printf("%c", solve(i));
        }
        printf("\n%d\n", res);
    }
    return 0;
}