用python实现词频统计

最新推荐文章于 2025-06-17 22:05:32 发布

原创最新推荐文章于 2025-06-17 22:05:32 发布 · 1.3w 阅读

67 ·

CC 4.0 BY-SA版权

文章标签：

#python #统计模型 #字典

数据分析必经之路专栏收录该内容

41 篇文章

订阅专栏

本文介绍了如何使用Python进行词频统计。通过去除标点符号，将句子拆分成单词列表，然后利用字典来统计每个单词的出现次数。提供的代码示例清晰地展示了这一过程，帮助读者理解并实践Python中的文本处理技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

用python实现词频统计

词频统计就是输入一段句子或者一篇文章，然后统计句子中每个单词出现的次数。

那么，这个在python中其实是很好实现的，下面我们来看看具体是怎样实现的，里面又用到了哪些知识呢？

输入一段话，统计每个字母出现的次数

先来讲一下思路：

例如给出下面这样一句话

Love is more than a word
it says so much.
When I see these four letters,
I almost feel your touch.
This is only happened since
I fell in love with you.
Why this word does this,
I haven’t got a clue.

那么想要统计里面每一个单词出现的次数，思路很简单，遍历一遍这个字符串，再定义一个空字典count_dict，看每一个单词在这个用于统计的空字典count_dict中的key中存在否，不存在则将这个单词当做count_dict的键加入字典内，然后值就为1，若这个单词在count_dict里面已经存在，那就将它对应的键的值+1就行

下面来看代码：

#定义字符串
sentences = """           # 字符串很长时用三个引号
Love is more than a word
it says so much.
When I see these four letters,
I almost feel your touch.
This is only happened since
I fell in love with you.
Why this word does this,
I haven't got a clue.
"""
#具体实现
#  将句子里面的逗号去掉,去掉多种符号时请用循环，这里我就这样吧
sentences=sentences.replace(',','')   
sentences=sentences.replace('.','')   #  将句子里面的.去掉
sentences = sentences.split()         # 将句子分开为单个的单词，分开后产生的是一个列表sentences
# print(sentences)
count_dict = {}
for sentence in sentences:
    if sentence not in count_dict:    # 判断是否不在统计的字典中
        count_dict[sentence] = 1
    else:                              # 判断是否不在统计的字典中
        count_dict[sentence] += 1
for key,value in count_dict.items():
    print(f"{key}出现了{value}次")