HduJunk-Mail Filter(并查集)-优快云博客

本文介绍了一种通过两步法识别垃圾邮件的算法：首先提取样本邮件的共同特征，然后使用匹配器确定邮件是否为垃圾邮件。文章详细描述了用于维护特征间关系的数据结构及其操作方法，并提供了具体的实现代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Junk-Mail Filter

Time Limit: 15000/8000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 11052 Accepted Submission(s): 3451

Problem Description

Recognizing junk mails is a tough task. The method used here consists of two steps:
1) Extract the common characteristics from the incoming email.
2) Use a filter matching the set of common characteristics extracted to determine whether the email is a spam.

We want to extract the set of common characteristics from the N sample junk emails available at the moment, and thus having a handy data-analyzing tool would be helpful. The tool should support the following kinds of operations:

a) “M X Y”, meaning that we think that the characteristics of spam X and Y are the same. Note that the relationship defined here is transitive, so
relationships (other than the one between X and Y) need to be created if they are not present at the moment.

b) “S X”, meaning that we think spam X had been misidentified. Your tool should remove all relationships that spam X has when this command is received; after that, spam X will become an isolated node in the relationship graph.

Initially no relationships exist between any pair of the junk emails, so the number of distinct characteristics at that time is N.
Please help us keep track of any necessary information to solve our problem.

Input

There are multiple test cases in the input file.
Each test case starts with two integers, N and M (1 ≤ N ≤ 10⁵ , 1 ≤ M ≤ 10⁶), the number of email samples and the number of operations. M lines follow, each line is one of the two formats described above.
Two successive test cases are separated by a blank line. A case with N = 0 and M = 0 indicates the end of the input file, and should not be processed by your program.

Output

For each test case, please print a single integer, the number of distinct common characteristics, to the console. Follow the format as indicated in the sample below.

Sample Input

3 1
M 1 2

0 0

Sample Output

Case #1: 3 

Case #2: 2

题意：如果为M,X,Y如正常并查集那样做，X与Y想联系。并且具有传递性；
如果为S，X那么所有与X相关的联系。
难点：当S，X时，怎样处理X，如果给X一个>n的Z父节点，所有的其他点同样会把Z作为父节点，这样不行。；关键在于能把其余结点转移到同一个其他与X无关的节点
这里给每一个节点在n—2*n之间设一个初始父节点，join时，所有节点会把初始父节点作为根节点，当S-X，删除X的关系时把X的父节点定为大于2*n的点，就可以把X排除在外

代码

#include<iostream>
#include<stdio.h>
#include<algorithm>
#include<math.h>
#include<string>
#include<string.h>
#include<set>
#define ll long long
using namespace std;
const int maxn = 1300000;
int vis[maxn];
int per[maxn];
int n,m,id;
char s[5];
int Find(int x){
    int cx = x;
    while(cx!=per[cx]){
        cx=per[cx];
    }
    int i=x,j;
    while(i!=cx){
        j=per[i];
        per[i]=cx;
        i=j;
    }
    return cx;
}
void Join(int x,int y)
{
    int cx=Find(x);
    int cy=Find(y);
    if(cx!=cy){
        per[cy]=cx;
    }
}
void del(int x){
    per[x]=id++;
}
int main()
{
    int flag=1;
    while(scanf("%d %d",&n,&m)){
        if(n==0&&m==0) break;
        id=n+n;
        for(int i=0;i<n;i++)
            per[i]=i+n;
        for(int i=n;i<=n+n+m;i++)
            per[i]=i;
        int x,y;
        while(m--){
          scanf("%s",s);
          if(s[0]=='M'){
            scanf("%d %d",&x,&y);

            Join(x,y);
            //printf("x-%d y-%d\n",Find(x),Find(y));
          }
          else {
            scanf("%d",&x);
            del(x);
          }
        }
        memset(vis,0,sizeof(vis));
        int sum=0;
        for(int i=0;i<n;i++)
        {
            if(!vis[Find(i)]) ++sum;
            vis[per[i]]=1;
        }
        //printf("per[0]-%d\n",Find(per[0]));
        //for(int i=0;i<n;i++)
         //   printf("%d ",Find(i));
       // cout << endl;
       // for(int i=0;i<=id;i++)
       //     printf("%d ",vis[i]);
       // cout << endl;
        printf("Case #%d: %d\n",flag++,sum);
    }
    return 0;
}