Python Cookbook 之一数据结构和算法（三）：保存最后 N 个元素

恋梦轩

已于 2023-02-13 16:57:56 修改

阅读量211

点赞数

CC 4.0 BY-SA版权

分类专栏： Python Cookbook 学习笔记文章标签：保存最后的元素队列 deque collections

于 2018-11-19 21:43:54 首次发布

本文链接：https://blog.youkuaiyun.com/cybeyond_xuan/article/details/84203956

Python Cookbook 学习笔记专栏收录该内容

4 篇文章

订阅专栏

本文介绍如何使用Python的collections.deque来保存迭代过程中的最后N个元素，适用于文本匹配等场景，通过生成器函数实现搜索代码与结果使用的解耦。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

欢迎大家扫码关注我的微信公众号：
数据之恋

一、问题

我们希望在迭代或是其他形式的处理过程中对最后几项记录做一个有限的历史记录统计。

二、解决方案

保存有限的历史记录可算是 collections.deque 的完美应用场景了。

如：想要对一系列文本行做简单的文本匹配操作，当发现有匹配时就输出当前的匹配行以及最后检查过的 N 行文本：

from collections import deque


def find_value(lines, pattern, length=5):
    previous_lines = deque(maxlen=length)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)


if __name__ == '__main__':
    with open('test.txt') as f:
        for line, prevlines in find_value(f, 'python', 5):
            for i in prevlines:
                print('11  ' + i, end='')
            print('  22  ' + line, end='')
            print('-'*20)
            print()

其中， test.txt 文本文件内容如下：

I love python.
You love java.
python and java.
php
c and c++
ruby
deeplearning
python
python and c
c++ and python
I use python.

运行结果如下所示：

  22  I love python.
--------------------

11  I love python.
11  You love java.
  22  python and java.
--------------------

11  python and java.
11  php
11  c and c++
11  ruby
11  deeplearning
  22  python
--------------------

11  php
11  c and c++
11  ruby
11  deeplearning
11  python
  22  python and c
--------------------

11  c and c++
11  ruby
11  deeplearning
11  python
11  python and c
  22  c++ and python
--------------------

11  ruby
11  deeplearning
11  python
11  python and c
11  c++ and python
  22  I use python.
--------------------

三、讨论

当编写搜索某项记录的代码时，通常会用到含有 yield 关键字的生成器函数。因为这会将处理搜索过程的代码和使用搜索结果的代码成功解耦开来。

deque(maxlen=N) 创建了一个固定长度的队列。当有新记录加入而队列已满时会自动移除最老的那条记录。

>>> from collections import deque
>>> d = deque(maxlen=4)
>>> d.append(1)
>>> d.append(2)
>>> d.append(3)
>>> d.append(4)
>>> d
deque([1, 2, 3, 4], maxlen=4)
>>> d.append(5)
>>> d
deque([2, 3, 4, 5], maxlen=4)

尽管可以在列表上手动完成这样的操作（append、 del），但队列这种解决方案要优雅的多，而且 运行速度也会快很多。

如果不指定队列的大小，也就得到了一个无界限的队列，可以在两端执行添加和弹出的操作：

>>> from collections import deque
>>> d = deque()
>>> d.append(1)
>>> d.append(2)
>>> d.append(3)
>>> d
deque([1, 2, 3])
>>> d.appendleft(4)
>>> d
deque([4, 1, 2, 3])
>>> d.popleft()
4
>>> d
deque([1, 2, 3])
>>> d.append(5)
>>> d
deque([1, 2, 3, 5])
>>> d.appendleft(6)
>>> d
deque([6, 1, 2, 3, 5])
>>> d.popleft()
6

【注意】从队列两端添加或弹出元素的复杂度都是 O(1)。这和列表不同，当从列表的头部插入或移除元素时，列表的复杂度为 O(N)。这也是为什么使用队列会更快的原因！