python计算词频

1.代码

import re

text = """  

Got this panda plush toy for my daughter's birthday,  

who loves it and takes it everywhere. It's soft and  

super cute, and its face has a friendly look. It's  

a bit small for what I paid though. I think there  

might be other options that are bigger for the  

same price. It arrived a day earlier than expected,  

so I got to play with it myself before I gave it  

to her.  

"""

def remove_punctuation(text):

    cleaned_text = re.sub(r'[^\w\s]', '', text).lower()

    words = ' '.join(cleaned_text.split())

    return words

def wordcount(text):

    clean_text = remove_punctuation(text)

    words = clean_text.split()

    word_count = {}

    for word in words:

        if word in word_count:

            word_count[word] += 1

        else:

            word_count[word] = 1

    return word_count

print(wordcount(text))

运行结果:

{'got': 2, 'this': 1, 'panda': 1, 'plush': 1, 'toy': 1, 'for': 3, 'my': 1, 'daughters': 1, 'birthday': 1, 'who': 1, 'loves': 1, 'it': 5, 'and': 3, 'takes': 1, 'everywhere': 1, 'its': 3, 'soft': 1, 'super': 1, 'cute': 1, 'face': 1, 'has': 1, 'a': 3, 'friendly': 1, 'look': 1, 'bit': 1, 'small': 1, 'what': 1, 'i': 4, 'paid': 1, 'though': 1, 'think': 1, 'there': 1, 'might': 1, 'be': 1, 'other': 1, 'options': 1, 'that': 1, 'are': 1, 'bigger': 1, 'the': 1, 'same': 1, 'price': 1, 'arrived': 1, 'day': 1, 'earlier': 1, 'than': 1, 'expected': 1, 'so': 1, 'to': 2, 'play': 1, 'with': 1, 'myself': 1, 'before': 1, 'gave': 1, 'her': 1}

2.debug

(1)移除“text”中标点符号并转换成小写

(2)去除cleaned_text多余空格

(3)分割单词

(4)遍历words

(5)判段word是否存在于字典中

不存于字典中就计数1

存在于字典中就加1

遍历整段文本后程序结束

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值