python 学习笔记(四) 统计序列中元素出现的频度(即次数)

本文介绍如何使用Python进行元素和单词的频率统计,包括随机序列中频度最高元素的查找及英文文章中高频单词的统计。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

案例一:在某随机序例中,找到出现频度最高的3个元素,它们出现的次数是多少?

 1 from random import randint
 2 # 利用列表解析器生成随机序列,包含有30个元素
 3 data = [randint(0, 20) for _ in range(30)]
 4 # 以data中的元素作为字典的键,以0作为值创建一个字典
 5 my_dict = dict.fromkeys(data,0)
 6 # 对序列data进行迭代循环
 7 for x in data:
 8     my_dict[x] += 1 # 对迭代的每个元素在其相对应的键值上加1 
 9 # 对字典进行排序,按倒序排列
10 result = sorted(my_dict.items(), key=lambda x: x[1], reverse=True)
11 # 输出出现频度最高的三个元素
12 print(result[:3]

在 python 中还有一种更简便有效的方法,那就是使用 collections.Counter 对象。将序列传入Counter 的构造器,得到的 Counter 对象是包含元素频度的字典,Counter.most_common(n) 方法得到频度最高的 n 个元素的列表。

from collections import Counter
...
# 将序列data作为Counter的构造参数
counter = Counter(data)
# 将3作为参数传给Counter.most_common()
result = counter.most_common(3)
# 输出结果,即出现频度最高的三个元素和次数
print(result)

 

案例二:对某英文文章的单词进行词频统计,找到出现频度最高的10个单词,它们出现的次数是多少?

 1 import re
 2 from collections import Counter
 3 
 4 # 要处理的文本
 5 txt = '''
 6 Beautiful is better than ugly.
 7 Explicit is better than implicit.
 8 Simple is better than complex.
 9 Complex is better than complicated.
10 Flat is better than nested.
11 Sparse is better than dense.
12 Readability counts.
13 Special cases aren't special enough to break the rules.
14 Although practicality beats purity.
15 Errors should never pass silently.
16 Unless explicitly silenced.
17 In the face of ambiguity, refuse the temptation to guess.
18 There should be one-- and preferably only one --obvious way to do it.
19 Although that way may not be obvious at first unless you're Dutch.
20 Now is better than never.
21 Although never is often better than *right* now.
22 If the implementation is hard to explain, it's a bad idea.
23 If the implementation is easy to explain, it may be a good idea.
24 Namespaces are one honking great idea -- let's do more of those!
25 '''
26 # 用正则表达式拆分单词    
27 words = re.split('\W+', txt)
28 # 将列表作为参数传给Counter构造函数
29 counter = Counter(words)
30 # 找到10个出现频度最高的单词
31 result = counter.most_common(10)
32 # 输出出现频度最高的10个单词和它们出现的次数
33 print(result)

输出结果:

 

转载于:https://www.cnblogs.com/walo/p/11253506.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值