Word Frequence Counting with NLTK

本文介绍如何利用NLTK库进行文本中单词频率的统计分析,并展示了如何查找特定词汇的相关词汇、共同上下文及词汇分布等。通过具体示例说明了如何生成词汇列表和频率分布图表。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Word Frequence Counting with NLTK

Version info

Python 2.4 or 2.5 (test with 2.7)

NLTK2.0 (downward compatibility, test with 3.2.3)

Anaconda2 4.3

Code

from nltk.book import *
text1.concordance("monstrous")
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us , 
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But 
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text1.similar("monstrous")
imperial subtly impalpable pitiable curious abundant perilous
trustworthy untoward singular lamentable few determined maddens
horrible tyrannical lazy mystifying christian exasperate
text2.similar("monstrous")
very exceedingly so heartily a great good amazingly as sweet
remarkably extremely vast
text2.common_contexts(["monstrous", "very"])
a_pretty is_pretty a_lucky am_glad be_glad
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

这里写图片描述

text3.generate()
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-17-e0816ba18b61> in <module>()
----> 1 text3.generate()


TypeError: generate() takes exactly 2 arguments (1 given)
len(text3)
44764
sorted(set(text3))
[u'!',
 ...
 u'A',
 u'Abel',
 u'Abelmizraim',
 ...
 u'coffin',
 u'cold',
 ...]
len(set(text3))
2789
# average usage of each word 
from __future__ import division
len(text3) / len(set(text3))
16.050197203298673
text3.count("smote")
5
# usage percentage of a word
100 * text4.count('a') / len(text4)
1.4643016433938312
fdist1 = FreqDist(text1)
print(fdist1)
<FreqDist with 19317 samples and 260819 outcomes>
vocabulary1 = fdist1.keys()
print(vocabulary1[:50])
[u'funereal', u'unscientific', u'divinely', u'foul', u'four', u'gag', u'prefix', u'woods', u'clotted', u'Duck', u'hanging', u'plaudits', u'woody', u'Until', u'marching', u'disobeying', u'canes', u'granting', u'advantage', u'Westers', u'insertion', u'DRYDEN', u'formless', u'Untried', u'superficially', u'vesper', u'Western', u'portentous', u'meadows', u'sinking', u'Ding', u'Spurn', u'treasuries', u'churned', u'oceans', u'powders', u'tinkerings', u'tantalizing', u'yellow', u'bolting', u'uncertain', u'stabbed', u'bringing', u'elevations', u'ferreting', u'wooded', u'songster', u'uttering', u'scholar', u'Less']
fdist1['whale']
906
fdist1.plot(50, cumulative=True)

这里写图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值