hdu 2473 (并查集删除操作)

垃圾邮件过滤算法

最新推荐文章于 2019-10-28 10:00:48 发布

原创最新推荐文章于 2019-10-28 10:00:48 发布 · 3.5k 阅读

3 ·

CC 4.0 BY-SA版权

数据结构专栏收录该内容

33 篇文章

订阅专栏

本文介绍了一种通过提取共同特征并使用匹配过滤器来识别垃圾邮件的方法。该方法涉及处理大量样本邮件并支持多种操作，包括合并相似邮件特征及移除误判的垃圾邮件。

Junk-Mail Filter

Time Limit: 15000/8000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 5110 Accepted Submission(s): 1629

Problem Description

Recognizing junk mails is a tough task. The method used here consists of two steps:
1) Extract the common characteristics from the incoming email.
2) Use a filter matching the set of common characteristics extracted to determine whether the email is a spam.

We want to extract the set of common characteristics from the N sample junk emails available at the moment, and thus having a handy data-analyzing tool would be helpful. The tool should support the following kinds of operations:

a) “M X Y”, meaning that we think that the characteristics of spam X and Y are the same. Note that the relationship defined here is transitive, so
relationships (other than the one between X and Y) need to be created if they are not present at the moment.

b) “S X”, meaning that we think spam X had been misidentified. Your tool should remove all relationships that spam X has when this command is received; after that, spam X will become an isolated node in the relationship graph.

Initially no relationships exist between any pair of the junk emails, so the number of distinct characteristics at that time is N.
Please help us keep track of any necessary information to solve our problem.

Input

There are multiple test cases in the input file.
Each test case starts with two integers, N and M (1 ≤ N ≤ 10 ⁵ , 1 ≤ M ≤ 10 ⁶), the number of email samples and the number of operations. M lines follow, each line is one of the two formats described above.
Two successive test cases are separated by a blank line. A case with N = 0 and M = 0 indicates the end of the input file, and should not be processed by your program.

Output

For each test case, please print a single integer, the number of distinct common characteristics, to the console. Follow the format as indicated in the sample below.

Sample Input

Sample Output

  
   Case #1: 3
Case #2: 2

Source

2008 Asia Regional Hangzhou

Recommend

lcy

和之前uva的那道并查集的题一样，都是蕴含着删除操作的并查集。因为并查集是树形结构，所以无法简单的把一个节点从一棵树中删去并维护原来的信息。那这里用到的思想就是还是保持原来的树的结构不变，只是把被删掉的那个点设为虚点，并新建一个点，把原来的点映射到这个新点上，代表以后的操作都是对这个新点进行操作。这样空间开销虽然大，但还是可以解决问题的。

#include <iostream>
#include <cstring>
#include <cstdio>
#include <string>
#include <algorithm>
#include <map>
#include <vector>
#include<queue>
#include<set>
using namespace std;
typedef long long LL;
typedef pair<int,int> P;
const int maxn = 2000000 + 5;
const int INF = 1000000000;

int cnt;
int fa[maxn],id[maxn];

int Find(int x){return fa[x]==x?fa[x]:fa[x]=Find(fa[x]);}
set<int> S;

int main(){
    int n,m;
    int kase = 0;
    while(scanf("%d%d",&n,&m)){
        if(n == 0 && m == 0) break;
        kase++;
        cnt = n;
        for(int i = 0;i < maxn;i++) fa[i] = i;
        for(int i = 0;i < n;i++) id[i] = i;
        while(m--){
            char s[5];
            int a,b;
            scanf("%s",s);
            if(s[0] == 'M'){
                scanf("%d%d",&a,&b);
                int X = Find(id[a]);
                int Y = Find(id[b]);
                if(X != Y) fa[X] = Y;
            }
            else{
                scanf("%d",&a);
                id[a] = cnt++;
            }
        }
        S.clear();
        for(int i = 0;i < n;i++){
            S.insert(Find(id[i]));
        }
        printf("Case #%d: %d\n",kase,S.size());
    }
    return 0;
}