创建词云报错“NLTK python error: “TypeError: 'dict_keys' object is not subscriptable””

最新推荐文章于 2024-04-12 13:41:06 发布

原创最新推荐文章于 2024-04-12 13:41:06 发布 · 503 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python 自然语言处理机器学习

python自然语言处理专栏收录该内容

3 篇文章

订阅专栏

本文介绍了一种使用NLTK库进行电影评论文本预处理的方法，包括去除停用词、标点符号过滤及词频统计等步骤，最终选取了出现频率最高的1%词汇作为特征。

正确如下

%python
from nltk.corpus import movie_reviews
from nltk.corpus import stopwords
from nltk import FreqDist
import string
sw = set(stopwords.words('english'))
punctuation = set(string.punctuation)
def isStopWord(word):
    return word in sw or word in punctuation
review_words = movie_reviews.words()
filtered = [w.lower() for w in review_words if not isStopWord(w.lower())]
words = FreqDist(filtered)
vob = list(words.keys())
N = int(.01*len(vob))
tags = vob[:N]
for tag in tags:
    print(tag,':',words[tag])
将 words.keys变成list类型，就可以了。