python_homework4:用Python统计政府工作报告热词频率并生成词云

晚安ticl

于 2025-04-16 21:23:45 发布

阅读量212

点赞数 4

文章标签： python 开发语言

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/m0_72952642/article/details/147286990

版权

将政府工作报告保存为report.txt，准备中文字体文件，中文停用词表stopwords.txt

import jieba
from wordcloud import WordCloud
from collections import Counter
import matplotlib.pyplot as plt

# 1. 读取文本文件（假设文件名为report.txt）
with open('report.txt', 'r', encoding='utf-8') as f:
text = f.read()

# 2. 中文分词
words = jieba.lcut(text)

# 3. 加载停用词表（需提前准备或下载中文停用词表）
stopwords = set()
with open('stopwords.txt', 'r', encoding='utf-8') as f:
for line in f:
stopwords.add(line.strip())

# 4. 过滤停用词和非中文字符
filtered_words = [
word for word in words
if len(word) > 1
and word not in stopwords
and '\u4e00' <= word <= '\u9fff'
]

# 5. 统计词频
word_counts = Counter(filtered_words)
top_words = word_counts.most_common(50)

# 打印前20个高频词
print("高频词汇Top20:")
for word, count in top_words[:20]:
print(f"{word}: {count}")

# 6. 生成词云
font_path = 'msyh.ttc' # 需要中文字体文件
wc = WordCloud(
font_path=font_path,
background_color='white',
max_words=200,
width=800,
height=600
)

wc.generate_from_frequencies(word_counts)

# 显示词云
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

# 保存词云图片
wc.to_file('wordcloud.png')

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。