Python求文件中单词的个数,平均长度,出现最多的5个单词

本文介绍了一个使用Python进行文本处理和统计分析的方法,包括计算文本中唯一单词的数量、平均单词长度及最常用的五个单词。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

求文件中 the total number of unique words,The average length of all words in the text,the top five most commonly used words in the text

#!/usr/bin/python
# -*- coding: UTF-8 -*-
def getText():
    txt= open('Rental.txt','rb',encoding='UTF-8').read()
    #while open('Rental.txt','rb') as f:
    #txt = f.readline()
    txt = txt.lower()
   # print(txt)
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~1234567890':
        txt = txt.replace(ch, " ")
    return txt
hamletTxt = getText()
words  = hamletTxt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
t=set(items)
#print(t)
#print(items[0])
#print(len(t))
num=0
sum=0
print("the total number of unique words in the {}\n".format(len(t)))
for i in range(len(items)):
    word, count = items[i]
    t=len(word)
    sum=t*count+sum
    num+=count
#print(num)
print("The average length of all words in the text is {}\n".format((sum/num)))
print("the top five most commonly used words in the text ")
for i in range(5):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值