文件方式实现完整的英文词频统计实例(9.27)

本文介绍了一种简单的文本分析方法,包括读取字符串、提取单词、计数并排序单词出现频率等步骤。通过排除常见语法词汇,实现了对特定单词的统计,并输出了出现频率最高的前20个单词。

1.读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

文本代码如下:

girl='''Remembering me, Discover and see All over the world, She's known as a girl To those who a free, The mind shall be key Forgotten as the past 'Cause history will last

God is a girl, Wherever you are, Do you believe it, can you recieve it? God is a girl, Whatever you say, Do you believe it, can you recieve it? God is a girl.'''

实现代码如下:

fo=open('daili.txt','r')
girl=fo.read()
girl='''Remembering me, Discover and see All over the world, She's known as a girl To those who a free, The mind shall be key Forgotten as the past 'Cause history will last

God is a girl, Wherever you are, Do you believe it, can you recieve it? God is a girl, Whatever you say, Do you believe it, can you recieve it? God is a girl.'''
exc={'','a','the','and','is','as','you','me','do'}

girl=girl.lower()
for i in ',?': 
 girl=girl.replace(i,' ')
words=girl.split(' ')
print('歌词:\n',words)

dict={}
keys=set(words)
keys=keys-exc
print('最终单词:\n',keys)


for i in words:
 dict[i]=words.count(i)
print('统计单词结果:\n',dict)


dai=list(dict.items())
dai.sort(key=lambda x:x[1],reverse=True)
print('排序结果:\n')


for i in range(20):
 print(dai[i])

fo.close()

程序结果如下:

转载于:https://www.cnblogs.com/laidaili/p/7595295.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值