5.1.5 Junk-Mail Filter

本文探讨了一种用于识别垃圾邮件的双阶段方法。首先从收到的垃圾邮件中提取共同特征,然后使用这些特征来匹配并确定是否为垃圾邮件。通过支持多种操作,如认为特定垃圾邮件特征相同或不同,本文提供了一个数据处理工具帮助跟踪和解决垃圾邮件问题。通过实例输入,演示了如何使用此工具进行操作和计算独特特征的数量。

Junk-Mail Filter

Time Limit: 15000/8000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 181 Accepted Submission(s): 62

Problem Description
Recognizing junk mails is a tough task. The method used here consists of two steps:
1) Extract the common characteristics from the incoming email.
2) Use a filter matching the set of common characteristics extracted to determine whether the email is a spam.

We want to extract the set of common characteristics from the N sample junk emails available at the moment, and thus having a handy data-analyzing tool would be helpful. The tool should support the following kinds of operations:

a) “M X Y”, meaning that we think that the characteristics of spam X and Y are the same. Note that the relationship defined here is transitive, so
relationships (other than the one between X and Y) need to be created if they are not present at the moment.

b) “S X”, meaning that we think spam X had been misidentified. Your tool should remove all relationships that spam X has when this command is received; after that, spam X will become an isolated node in the relationship graph.

Initially no relationships exist between any pair of the junk emails, so the number of distinct characteristics at that time is N.
Please help us keep track of any necessary information to solve our problem.
 

Input
There are multiple test cases in the input file.
Each test case starts with two integers, N and M (1 ≤ N ≤ 10 5 , 1 ≤ M ≤ 10 6), the number of email samples and the number of operations. M lines follow, each line is one of the two formats described above.
Two successive test cases are separated by a blank line. A case with N = 0 and M = 0 indicates the end of the input file, and should not be processed by your program.
 

Output
For each test case, please print a single integer, the number of distinct common characteristics, to the console. Follow the format as indicated in the sample below.
 

Sample Input
5 6
M 0 1
M 1 2
M 1 3
S 1
M 1 2
S 3

3 1
M 1 2

0 0
 

Sample Output
Case #1: 3
Case #2: 2

思路:咦,我开始也搞错了,后来看别人的代码,大概差不多自己也想到那儿去了,但是我时间不多了,要赶紧进入下一章,所以也没好好领悟。

 1 #include <cstdio>
 2 #include <iostream>
 3 #include <cstdlib>
 4 #include <algorithm>
 5 #include <cstring>
 6 #include <string>
 7 using namespace std;
 8 
 9 const int maxn=100010,maxm=1000010;
10 int cases,cnt,ans,fx,fy,a,b,n,m;
11 int f[maxn+maxm],r[maxm+maxn];
12 bool flag[maxn+maxm];
13 char s[2];
14 
15 void close()
16 {
17     fclose(stdin);
18     fclose(stdout);
19     exit(0);
20 }
21 
22 int find(int k)
23 {
24     if (f[k]==k)
25         return k;
26     return f[k]=find(f[k]);
27 }
28 
29 void merge(int x,int y)
30 {
31     fx=find(x);
32     fy=find(y);
33     if (fx!=fy)
34         f[fx]=fy;
35 }
36 
37 void work()
38 {
39 }
40 
41 void init ()
42 {
43 freopen("spam.in","r",stdin);
44 freopen("spam.out","w",stdout);
45 cases=0;
46     while(scanf("%d %d",&n,&m)!=EOF)
47      {
48          memset(flag,false,sizeof(flag));
49          memset(r,0,sizeof(r));
50          memset(f,0,sizeof(f));
51          cases++;
52          cnt=0;
53          if (n==0 && m==0) break;
54          for (int i=0;i<n;i++)
55          {
56              f[i]=i;
57              r[i]=i;
58          }
59          cnt=n-1;
60          for (int i=1;i<=m;i++)
61          {
62              scanf("%s",s);
63              if (s[0]=='M')
64              {
65                  scanf("%d %d",&a,&b);
66                  merge(r[a],r[b]);
67              }
68              else
69              {
70                  scanf("%d",&a);
71                  //flag[a]=true;
72                  cnt++;
73                  f[cnt]=cnt;
74                  r[a]=cnt;
75              }
76          }
77          ans=0;
78          for (int i=0;i<n;i++)
79          {
80              int t=find(r[i]);
81              if (not flag[t])
82              {
83                  flag[t]=true;
84                  ans++;
85              }
86          }
87          printf("Case #%d: %d\n",cases,ans);
88      }
89 }
90 
91 int main ()
92 {
93     init();
94     work();
95     close();
96     return 0;
97 }

 

转载于:https://www.cnblogs.com/cssystem/archive/2013/02/14/2911293.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值