python itertools_groupby使用示例(转)

转自: http://freshfoo.com/blog/itertools_groupby

 

A relatively unknown part of the Python standard library that I find myself using fairly regularly at work these days is the groupby function in the itertools module. In a nutshell, groupby takes an iterator and breaks it up into sub-iterators based on changes in the "key" of the main iterator. This is of course done without reading the entire source iterator into memory.

The "key" is almost always based on some part of the items returned by the iterator. It is defined by a "key function", much like the sorted builtin function. groupby probably works best when the data is grouped by the key but this isn't strictly necessary. It depends on the use case.

I've successfully used groupby for splitting up the results of large database queries or the contents of large data files. The resulting code ends up being clean and small.

Here's an example:

from itertools import groupby
from operator import itemgetter

things = [('2009-09-02', 11),
          ('2009-09-02', 3),
          ('2009-09-03', 10),
          ('2009-09-03', 4),
          ('2009-09-03', 22),
          ('2009-09-06', 33)]

for key, items in groupby(things, itemgetter(0)):
    print key
    for subitem in items:
        print subitem
    print '-' * 20

Here the dummy data in the "things" list is grouped by the first item of each element (that is, the key is the first element). For each key, the key is printed followed by the items returned by each sub-iterator.

The output looks like:

2009-09-02
('2009-09-02', 11)
('2009-09-02', 3)
--------------------
2009-09-03
('2009-09-03', 10)
('2009-09-03', 4)
('2009-09-03', 22)
--------------------
2009-09-06
('2009-09-06', 33)
-------------------

The "things" list is a contrived example. In a real world situation this could be a database cursor object or a CSV reader object. Any iterable object can be used.

Here's a closer look at what groupby is doing using the Python interactive shell:

>>> iterator = groupby(things, itemgetter(0))
>>> iterator
<itertools.groupby object at 0x95d3acc>
>>> iterator.next()
('2009-09-02', <itertools._grouper object at 0x95e0d0c>)
>>> iterator.next()
('2009-09-03', <itertools._grouper object at 0x95e0aec>)

You can see how a key and sub-iterator are returned for each pass through the groupby iterator.

groupby is a handy tool to have under your belt. Think of it whenever you need to split up a dataset by some criteria.

 

 

--end 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值