1.安装WordCloud
pip install wordcloud
2.导入
import pickle
import jieba
import pandas as pd
import wordcloud
import matplotlib.pyplot as plt
from imageio import imread
#%%
#读取chapter
pickle_file = open(r'C:\Users\yandi\PycharmProjects\MachineLearing\LearningTest01\SDTest\chapter.pkl','rb')
chapter = pickle.load(pickle_file)
pickle_file.close()
#%%
#读取停用词
stoplist = list(pd.read_csv(r'C:\Users\yandi\PycharmProjects\MachineLearing\LearningTest01\停用词.txt',
names=['w'],sep='aaa',encoding='UTF-8',engine='python').w)
def m_cut(intxt):
return [w for w in jieba.cut(intxt) if w not in stoplist and len(w) > 1]
ls = " ".join(m_cut(chapter.txt[1]))
#%%
cloudobj = wordcloud.WordCloud(mask=imread(r'C:\Users\yandi\PycharmProjects\MachineLearing\LearningTest01\射雕背景0.jpg'),
mode='RGBA',background_color=None
).generate(ls)
#%%
plt.imshow(cloudobj)
plt.axis('off')
plt.show()
- ls是字符串,但是传到WordCloud里面必须是用空格间隔的
- mask是指定背景图片
- 指定图片的色系
这段代码展示了如何利用jieba分词和WordCloud库对文本进行处理和可视化。首先,通过pickle加载章节内容,然后读取停用词列表,使用jieba进行文本分词并过滤掉停用词。接着,将处理后的字符串传递给WordCloud生成词云,并指定背景图片。最终,展示词云图,其中的词频反映了文本中各词汇的重要性。
3万+

被折叠的 条评论
为什么被折叠?



