@统计词频并输出高频词汇
所给数据为某日中国日报英文版的一篇新闻报道,现要求使用 Python 语言编写程序统计其中出线频率最高的十个单词,输出对应的单词内容和频率(以字典形式呈现)。
import jieba
import os
file =open("./dataset/englishgraph.txt","r",encoding="utf-8",)
txt = file.read()
words = jieba.lcut(txt)
counts = {
}
for word in words:
if len(word)>=2:
counts[word] = counts.get(word,0) + 1
list = list(counts.items())
list.sort(key=lambda x:x[1],reverse=True)
print(list)
输出结果
[('you', 14), ('to', 10), ('want', 5), ('have', 5), ('the', 5), ('enough', 4), ('make', 4),