collections.Counter()介绍——快速统计元素出现的次数

原创于 2025-03-26 21:35:07 发布 · 932 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#python #开发语言

python学习专栏收录该内容

25 篇文章

订阅专栏

collections.Counter() 是 Python 中一个强大的工具，用于快速统计可迭代对象中元素的出现次数。它属于 collections 模块，返回一个类似字典的子类，键是元素，值是元素的计数。

1、collections.Counter()用法介绍

基本用法

导入并创建 Counter 对象：

from collections import Counter

# 通过列表、字符串等可迭代对象初始化
c = Counter(['a', 'b', 'a', 'c', 'b', 'a']) 
print(c)  # 输出: Counter({'a': 3, 'b': 2, 'c': 1})

# 统计字符串中字符的出现次数
word_counter = Counter("abracadabra")
print(word_counter)  # 输出: Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

直接传递键值对或字典：

c = Counter(a=3, b=2)  # 等价于 Counter({'a':3, 'b':2})

常用方法

items()
.items() 方法用于返回元素及其计数的键值对视图，类似于字典的 items()。返回值包含 (元素, 计数) 的元组。

示例：

from collections import Counter

# 创建 Counter 实例
data = ["apple", "banana", "apple", "orange", "banana", "apple"]
counter = Counter(data)

# 使用 .items() 获取键值对
items_view = counter.items()
print(items_view)
# 输出：dict_items([('apple', 3), ('banana', 2), ('orange', 1)])

# 遍历键值对
for item, count in counter.items():
    print(f"{item}: {count}")
# 输出：
# apple: 3
# banana: 2
# orange: 1

# 转换为列表或字典
items_list = list(counter.items())  # [('apple',3), ('banana',2), ('orange',1)]
items_dict = dict(counter.items())  # {'apple':3, 'banana':2, 'orange':1}

统计词频后处理数据：

text = "hello world hello python world"
word_counter = Counter(text.split())

# 筛选出现次数大于1的单词
filtered = {word: count for word, count in word_counter.items() if count > 1}
print(filtered)  # {'hello': 2, 'world': 2}

按计数排序：

# 使用 sorted() 对计数排序（降序）
sorted_items = sorted(counter.items(), key=lambda x: -x[1])
print(sorted_items)  # [('apple',3), ('banana',2), ('orange',1)]

elements()：
返回一个迭代器，元素按出现次数重复。

for element in c.elements():
    print(element)  # 输出: a a a b b c

most_common(n)：
返回前 n 个最常见元素及其计数。

print(c.most_common(2))  # 输出: [('a', 3), ('b', 2)]

update()：
合并其他 Counter 或可迭代对象的计数。

c.update(['a', 'd'])  # c['a'] 变为 4, c['d'] 变为 1

subtract()：
减少计数（允许负数）。

c.subtract({'a': 2})  # c['a'] 变为 2

数学运算
加减操作：

c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)

# 加法
print(c1 + c2)  # Counter({'a': 4, 'b': 3})

# 减法（仅保留正数计数）
print(c1 - c2)  # Counter({'a': 2})

其他特性
访问不存在的元素返回 0（而非 KeyError）：

print(c['z'])  # 输出: 0

转换为字典：

dict(c)  # 转为普通字典：{'a':3, 'b':2, 'c':1}

应用场景
统计词频：

text = "apple banana orange apple apple banana"
words = text.split()
word_counts = Counter(words)
print(word_counts.most_common(1))  # 输出: [('apple', 3)]

数据分析：

data = [1, 2, 2, 3, 3, 3]
num_counts = Counter(data)
print(num_counts)  # 输出: Counter({3: 3, 2: 2, 1: 1})

快速计数替代手动循环。

注意事项

键必须是可哈希类型（如字符串、数字）。
计数可以为 0 或负数（但 most_common() 会忽略非正数）。

使用 Counter 可以大幅简化统计代码，提升效率。

2、使用实例

以一道题目为例：

给你一个整数数组 nums 。
如果一组数字 (i,j) 满足 nums[i] == nums[j] 且 i < j ，就可以认为这是一组好数对。
返回好数对的数目。

示例 1：
输入：nums = [1,2,3,1,1,3]
输出：4
解释：有 4 组好数对，分别是 (0,3), (0,4), (3,4), (2,5) ，下标从 0 开始

题目可以通过统计所有数字出现的次数n，该数值产生的对数为 $C_n^2 = n * (n – 1) // 2$

解法一

一般想到可以用集合来存放列表中的值，因为集合中的元素是独一无二的，然后对每个数字统计它在列表中出现的次数并累加该数字可产生的对数即可：

    def numIdenticalPairs(nums):
        nums_set = set(nums)
        c = 0
        for each in nums_set:
            n = nums.count(each)
            c += n*(n-1)//2
        return c

解法二

经过上面的理论学习，发现可以用collections.Counter()来提高效率：

def numIdenticalPairs(nums):
        m = collections.Counter(nums)
        return sum(v * (v - 1) // 2 for k, v in m.items())