WordCloud库的使用

最新推荐文章于 2025-03-19 12:26:42 发布

氧小氢

最新推荐文章于 2025-03-19 12:26:42 发布

阅读量1k

点赞数

文章标签： python

本文链接：https://blog.youkuaiyun.com/weixin_42510210/article/details/109706299

版权

Python 专栏收录该内容

8 篇文章

订阅专栏

WorldCloud

导入词云第三方库 worldcloud
```
import wordcloud
```
创建词云对象，并赋值给W
```
W = wordcloud.WordCloud()
```

设置词云图片宽、高、字体、背景颜色

W = wordcloud.WordCloud(
  width = 1000,
  height = 700,
  background_color = 'white',
  font_path = 'msyh.ttc'
)

# font_path = 'msyh.ttc'表示把字体设置为微软雅黑

调用词云对象的generate方法

# 1.将文本传入
# w.generate('and that government of the people, by the people, for the people, shall not perish from the earth.')
W.generate('从明天起，做一个幸福的人。喂马、劈柴，周游世界。从明天起，关心粮食和蔬菜。我有一所房子，面朝大海，春暖花开')

# 2.从外部读取文本，存入变量txt中
# file = open('abc.txt',encoding='utf-8')
# txt = file.read()
# W.generate(txt)

将生成的词云保存为output1.png图片文件，保存出到当前文件夹中
```
W.to_file('output1.png')
```

jieba

jieba库是一款优秀的python第三方中文分词库，jieba支持三种分词模式：
- 精确模式：试图将语句进行最精确的切分，不存在冗余数据，适合做文本分析
- 全模式：将语句中所有可能是词的词语都切分出来，存在冗余数据
- 搜索引擎模式：在精确模式的基础上，对长词再次进行切分
https://blog.youkuaiyun.com/codejas/article/details/80356544

导入中文分词库jieba

import jieba
import wordcloud

w = wordcloud.WordCloud(width=1000,
                        height=700,
                        background_color='white',
                        font_path='msyh.ttc')

调用jieba的lcut()方法对原始文本进行中文分词，得到string

# txt = '同济大学，简称“同济”，是中华人民共和国教育部直属，由教育部、国家海洋局和上海市共建的全国重点大学，历史悠久、声誉卓著，是国家“双一流”、“211工程”、“985工程”重点建设高校，也是收生标准最严格的中国大学之一'
# txtlist = jieba.lcut(txt)
# string = " ".join(txtlist)

# 对来自外部文件的文本进行中文分词，得到string
f = open('abc.txt',encoding='utf-8')
txt = f.read()
txtlist = jieba.lcut(txt)
string = " ".join(txtlist)

imageio

导入imageio库中的imread函数，并用这个函数读取本地图片，作为词云形状图片
```
import imageio
mk = imageio.imread("wujiaoxing.png")
```

构建并配置词云对象w，注意要加scale参数，提高清晰度

w = wordcloud.WordCloud(width=1000,
                        height=700,
                        background_color='white',
                        font_path='msyh.ttc',
                        mask=mk,
                        scale=15,
                       stopwords={"曹操", "孔明"},
                       contour_width=1,
                       contour_color="steelblue")

# 加stopwords集合参数，将不想展示在词云中的词放在stopwords集合里，这里去掉“曹操”和“孔明”两个词
# 增加参数contour_width和contour_color设置轮廓宽度和颜色

snownlp

import jieba
import wordcloud
import imageio
# 导入自然语言处理第三方库snownlp
import snownlp
mk = imageio.imread("chinamap.png")

# 构建并配置两个词云对象w1和w2，分别存放积极词和消极词
w1 = wordcloud.WordCloud(width=1000,
                        height=700,
                        background_color='white',
                        font_path='msyh.ttc',
                        mask=mk,
                        scale=15)
w2 = wordcloud.WordCloud(width=1000,
                        height=700,
                        background_color='white',
                        font_path='msyh.ttc',
                        mask=mk,
                        scale=15)
# 对来自外部文件的文本进行中文分词，得到积极词汇和消极词汇的两个列表
f = open('abc.txt',encoding='utf-8')
txt = f.read()
txtlist = jieba.lcut(txt)
positivelist = []
negativelist = []

# 下面对文本中的每个词进行情感分析，情感>0.96判为积极词，情感<0.06判为消极词
for each in txtlist:
    
    each_word = snownlp.SnowNLP(each)
    feeling = each_word.sentiments
    
    if feeling > 0.96:
        positivelist.append(each)
    elif feeling < 0.06:
        negativelist.append(each)
    else:
        pass
# 将积极和消极的两个列表各自合并成积极字符串和消极字符串，字符串中的词用空格分隔
positive_string = " ".join(positivelist)
negative_string = " ".join(negativelist)

# 将string变量传入w的generate()方法，给词云输入文字
w1.generate(positive_string)
w2.generate(negative_string)

# 将积极、消极的两个词云图片导出到当前文件夹
w1.to_file('output12-positive.png')
w2.to_file('output12-negative.png')
print('词云生成完成')