python itertools_groupby使用示例（转）

最新推荐文章于 2025-04-12 11:52:54 发布

原创最新推荐文章于 2025-04-12 11:52:54 发布 · 809 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #数据库 #shell

python 专栏收录该内容

49 篇文章

订阅专栏

本文介绍了 Python 标准库中 itertools 模块的 groupby 函数。通过示例展示了如何利用 groupby 对迭代器的数据进行分组处理，适用于大型数据库查询结果或大型数据文件的内容拆分，使代码更简洁高效。

转自： http://freshfoo.com/blog/itertools_groupby

A relatively unknown part of the Python standard library that I find myself using fairly regularly at work these days is the groupby function in the itertools module. In a nutshell, groupby takes an iterator and breaks it up into sub-iterators based on changes in the "key" of the main iterator. This is of course done without reading the entire source iterator into memory.

The "key" is almost always based on some part of the items returned by the iterator. It is defined by a "key function", much like the sorted builtin function. groupby probably works best when the data is grouped by the key but this isn't strictly necessary. It depends on the use case.

I've successfully used groupby for splitting up the results of large database queries or the contents of large data files. The resulting code ends up being clean and small.

Here's an example:

from itertools import groupby
from operator import itemgetter

things = [('2009-09-02', 11),
          ('2009-09-02', 3),
          ('2009-09-03', 10),
          ('2009-09-03', 4),
          ('2009-09-03', 22),
          ('2009-09-06', 33)]

for key, items in groupby(things, itemgetter(0)):
    print key
    for subitem in items:
        print subitem
    print '-' * 20

Here the dummy data in the "things" list is grouped by the first item of each element (that is, the key is the first element). For each key, the key is printed followed by the items returned by each sub-iterator.

The output looks like:

2009-09-02
('2009-09-02', 11)
('2009-09-02', 3)
--------------------
2009-09-03
('2009-09-03', 10)
('2009-09-03', 4)
('2009-09-03', 22)
--------------------
2009-09-06
('2009-09-06', 33)
-------------------

The "things" list is a contrived example. In a real world situation this could be a database cursor object or a CSV reader object. Any iterable object can be used.

Here's a closer look at what groupby is doing using the Python interactive shell:

>>> iterator = groupby(things, itemgetter(0))
>>> iterator
<itertools.groupby object at 0x95d3acc>
>>> iterator.next()
('2009-09-02', <itertools._grouper object at 0x95e0d0c>)
>>> iterator.next()
('2009-09-03', <itertools._grouper object at 0x95e0aec>)

You can see how a key and sub-iterator are returned for each pass through the groupby iterator.

groupby is a handy tool to have under your belt. Think of it whenever you need to split up a dataset by some criteria.

--end