shell脚本统计文件中单词的个数

最新推荐文章于 2021-03-10 23:48:47 发布

weixin_30849403

最新推荐文章于 2021-03-10 23:48:47 发布

阅读量315

点赞数

CC 4.0 BY-SA版权

文章标签： shell awk

原文链接：http://www.cnblogs.com/youxuguang/p/5917215.html

本文介绍了一种使用Shell脚本处理文本的方法，包括去除标点符号、统计单词频率等实用技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、方案

方法一：

（1）cat file|sed 's/[,.:;/!?]/ /g'|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' #其中file为要操作的文件，sed中/ /间有一个空格。

（2）sed 's/[,.:;/!?]/ /g' file|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' #（1）和（2）效果一致。

方法二：

（1）awk 'BEGIN{RS="[,.:;/!?]"}{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' file

二、验证

[root@hehe668 shell]# cat file
hello world,hi girl;how old are you?
where are you from?
how are you?
i am fine!thinks.
and you?
http://www.cnblogs.com/youxuguang/

[root@hehe668 shell]# cat file|sed 's/[,.:;/!?]/ /g'|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}'
com 1
http 1
from 1
www 1
i 1
you 4
hi 1
hello 1
youxuguang 1
and 1
world 1
cnblogs 1
where 1
old 1
how 2
fine 1
am 1
are 3
girl 1
thinks 1

[root@hehe668 shell]# sed 's/[,.:;/!?]/ /g' file|awk '{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}'
com 1
http 1
from 1
www 1
i 1
you 4
hi 1
hello 1
youxuguang 1
and 1
world 1
cnblogs 1
where 1
old 1
how 2
fine 1
am 1
are 3
girl 1
thinks 1

[root@hehe668 shell]# awk 'BEGIN{RS="[,.:;/!?]"}{for(i=1;i<=NF;i++)array[$i]++;}END{for(i in array) print i,array[i]}' file
com 1
http 1
from 1
www 1
i 1
you 4
hi 1
hello 1
youxuguang 1
and 1
world 1
cnblogs 1
where 1
old 1
how 2
fine 1
am 1
are 3
girl 1
thinks 1