垃圾数据的整理

本文介绍了一种使用awk脚本处理邮件日志的方法,通过解析邮件日志中的关键信息,如处理时间、邮件类型等,统计正常邮件与垃圾邮件的数量,并记录邮件处理的时间效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

要处理的部分文本内容格式如下:

Jul 12 09:28:17 mx1 spamd[2808]: spamd: identified spam (29.8/7.5) for (unknown):65534 in 2.4 seconds, 6104 bytes.
Jul 12 09:28:26 mx1 spamd[2808]: spamd: clean message (5.6/7.5) for (unknown):65534 in 3.8 seconds, 6112 bytes.
Jul 12 09:28:26 mx1 spamd[3230]: spamd: clean message (5.6/7.5) for (unknown):65534 in 3.8 seconds, 6108 bytes.
Jul 12 09:28:26 mx1 spamd[2808]: spamd: clean message (-96.4/7.5) for (unknown):65534 in 0.5 seconds, 6035 bytes.
Jul 12 09:28:27 mx1 spamd[2808]: spamd: clean message (-96.4/7.5) for (unknown):65534 in 0.5 seconds


将要被调用的脚本文件 statistic_spam.awk 内容如下
BEGIN {
today = strftime("%Y-%m-%d", systime())
}

# 正常邮件
/clean message/{
count_minute_clean[sprintf("%s:%02d", substr($3,1,2), int(substr($3,4,2)/5)*5)] ++
}

# 垃圾邮件
/identified spam/{
count_minute_spam[sprintf("%s:%02d", substr($3,1,2), int(substr($3,4,2)/5)*5)] ++
}

/in [0-9]*\.[0-9]* seconds/{
total[sprintf("%s:%02d", substr($3,1,2), int(substr($3,4,2)/5)*5)] += $13
count[sprintf("%s:%02d", substr($3,1,2), int(substr($3,4,2)/5)*5)] ++

if ($13 <= 0.5)
count_time[0] ++
else if (0.5 < $13 && $13 <=4)
count_time[1] ++
else
count_time[2] ++
}

END {
for(variable in total)
{
# print variable,
# count_minute_spam[variable], sprintf("%2.2f", count_minute_spam[variable] / count[variable] * 100),
# count_minute_clean[variable],
# count[variable],
# total[variable] / count[variable]

total_count_clean += count_minute_clean[variable]
total_count_spam += count_minute_spam[variable]
}

#####################################################################################################################
#| 日期 | 扫描邮件总量 | 正常邮件数量 | 垃圾邮件数量 | 处理时间小于0.5秒 | 处理时间小于4秒 | 处理时间大于4秒 |#
#####################################################################################################################
total_count = total_count_clean + total_count_spam
printf("| %10s | %12d | %12d | %12d | %17d | %15d | %15d |\n",
today,
total_count, total_count_clean, total_count_spam,
count_time[0], count_time[1], count_time[2])
}


最后执行的命令如下:

# /usr/local/bin/gawk -f statistic_spam.awk /var/log/maillog >> /tmp/KevinShell/statistic/output/statistic_spam.log.201107


看一下输出结果




将以上输出结果插入到数据库Statistic 表Spam中

# /usr/local/mysql/bin/mysql -uroot -pPASSWD -hlocalhost -DStatistic -e"insert into Spam(Date,Total,Ham,Spam,Lower1s,Lower4s,Greater4s)values(`gawk -f statistic_spam.awk /var/log/maillog |sed 's#|##g'|sed 's/[ ][ ]*/ /g'|sed 's/^ //g'|sed 's/ $//g'|sed 's/ /,/g'`)"








评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值