《手撕 LRU Cache：从 @lru_cache 底层原理到双向链表 + 哈希表的高性能实现》

原创于 2025-12-30 06:00:31 发布 · 496 阅读

14 ·

CC 4.0 BY-SA版权

文章标签：

#python #开发语言

学习笔记同时被 3 个专栏收录

340 篇文章

订阅专栏

课程教程

320 篇文章

订阅专栏

提升学习

91 篇文章

订阅专栏

2025博客之星年度评选已开启 10w+人浏览 2.6k人参与

《手撕 LRU Cache：从 @lru_cache 底层原理到双向链表 + 哈希表的高性能实现》

一、写在前面：为什么每个 Python 开发者都应该理解 LRU Cache？

Python 自 1991 年诞生以来，以其简洁优雅的语法、强大的生态系统和“胶水语言”的特性，成为 Web 开发、数据科学、人工智能、自动化运维等领域的首选语言。随着 Python 在高性能计算、分布式系统和数据密集型应用中的使用越来越广，**缓存（Cache）**的重要性也日益凸显。

在实际项目中，你一定遇到过这些场景：

某个函数计算量巨大，希望缓存结果避免重复计算
某个接口被频繁调用，希望减少数据库压力
某个数据结构需要快速淘汰旧数据，保持固定容量
某个服务需要实现本地缓存，提高响应速度

这些问题的核心解决方案之一，就是 LRU Cache（Least Recently Used Cache）。

Python 内置的 functools.lru_cache 是一个极其强大的工具，但你是否真正理解它的底层原理？你是否知道它内部使用了什么数据结构？你是否能手写一个高性能的 LRU Cache？

这篇文章将带你从基础到进阶，彻底掌握：

LRU Cache 的核心思想
@lru_cache 的底层实现原理
如何手写一个双向链表 + 哈希表的 O(1) LRU Cache
如何在实际项目中使用 LRU 提升性能
如何避免 LRU Cache 的常见坑

无论你是初学者还是资深开发者，这篇文章都能帮助你构建对缓存机制的深刻理解。

二、基础部分：什么是 LRU Cache？

1. LRU 的定义

LRU（Least Recently Used）是一种缓存淘汰策略：

当缓存满时，淘汰最近最少使用的数据。

它的核心思想是：

最近使用过的数据未来更可能被使用
很久没用的数据未来被使用的概率更低

因此，当缓存容量有限时，LRU 是一种非常合理的淘汰策略。

2. LRU Cache 的核心操作

一个 LRU Cache 必须支持两个操作，且都要达到 O(1) 时间复杂度：

（1）get(key)

如果 key 存在，返回 value，并将该 key 标记为“最近使用”
如果 key 不存在，返回 -1 或 None

（2）put(key, value)

如果 key 已存在，更新 value，并将其标记为“最近使用”
如果 key 不存在：
- 如果缓存未满，直接插入
- 如果缓存已满，淘汰“最久未使用”的节点

3. 为什么必须使用“双向链表 + 哈希表”？

为了实现 O(1)：

哈希表（dict）用于 O(1) 查找 key
双向链表用于 O(1) 移动节点（最近使用的放头部，最久未使用的放尾部）

结构如下：

哈希表 key -> 双向链表节点

双向链表：
head <-> node1 <-> node2 <-> ... <-> tail

三、深入理解：Python 内置 @lru_cache 的底层原理

Python 的 functools.lru_cache 是 CPython 用 C 实现的高性能缓存机制。

示例：

from functools import lru_cache

@lru_cache(maxsize=128)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

1. @lru_cache 的核心特性

使用 哈希表 + 双向链表 实现
key 必须是可哈希的（hashable）
自动淘汰最久未使用的数据
提供缓存统计信息（hits、misses）
提供 cache_clear() 清空缓存
提供 cache_info() 查看缓存状态

2. @lru_cache 的底层结构（简化版）

内部结构类似：

struct lru_cache {
    PyObject *cache_dict;   // 哈希表
    PyObject *root;         // 双向链表的哨兵节点
    int maxsize;
    int hits;
    int misses;
};

链表节点结构：

struct node {
    PyObject *key;
    PyObject *value;
    struct node *prev;
    struct node *next;
};

3. 为什么 @lru_cache 如此高效？

C 实现，性能极高
使用 PyDict（哈希表）快速查找
使用双向链表快速移动节点
使用哨兵节点（root）避免边界判断
使用 key 的 hash 值作为缓存索引

四、手撕 LRU Cache：双向链表 + 哈希表版（Python 实现）

下面我们手写一个高性能 LRU Cache，完全模拟 @lru_cache 的底层结构。

1. 定义双向链表节点

class Node:
    def __init__(self, key=None, value=None):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

2. 定义 LRU Cache 主体

class LRUCache:

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}  # key -> Node

        # 创建伪头尾节点（哨兵节点）
        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

3. 工具方法：添加节点到头部

def _add_node(self, node):
    node.prev = self.head
    node.next = self.head.next

    self.head.next.prev = node
    self.head.next = node

4. 工具方法：删除节点

def _remove_node(self, node):
    prev = node.prev
    nxt = node.next

    prev.next = nxt
    nxt.prev = prev

5. 工具方法：移动节点到头部（标记为最近使用）

def _move_to_head(self, node):
    self._remove_node(node)
    self._add_node(node)

6. 工具方法：弹出尾部节点（最久未使用）

def _pop_tail(self):
    node = self.tail.prev
    self._remove_node(node)
    return node

7. 实现 get()

def get(self, key):
    node = self.cache.get(key)
    if not node:
        return -1

    self._move_to_head(node)
    return node.value

8. 实现 put()

def put(self, key, value):
    node = self.cache.get(key)

    if node:
        node.value = value
        self._move_to_head(node)
    else:
        new_node = Node(key, value)
        self.cache[key] = new_node
        self._add_node(new_node)

        if len(self.cache) > self.capacity:
            tail = self._pop_tail()
            del self.cache[tail.key]

9. 完整代码（可直接运行）

class Node:
    def __init__(self, key=None, value=None):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None


class LRUCache:

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}

        self.head = Node()
        self.tail = Node()

        self.head.next = self.tail
        self.tail.prev = self.head

    def _add_node(self, node):
        node.prev = self.head
        node.next = self.head.next

        self.head.next.prev = node
        self.head.next = node

    def _remove_node(self, node):
        prev = node.prev
        nxt = node.next

        prev.next = nxt
        nxt.prev = prev

    def _move_to_head(self, node):
        self._remove_node(node)
        self._add_node(node)

    def _pop_tail(self):
        node = self.tail.prev
        self._remove_node(node)
        return node

    def get(self, key):
        node = self.cache.get(key)
        if not node:
            return -1
        self._move_to_head(node)
        return node.value

    def put(self, key, value):
        node = self.cache.get(key)

        if node:
            node.value = value
            self._move_to_head(node)
        else:
            new_node = Node(key, value)
            self.cache[key] = new_node
            self._add_node(new_node)

            if len(self.cache) > self.capacity:
                tail = self._pop_tail()
                del self.cache[tail.key]

五、实战案例：LRU Cache 在真实项目中的应用

1. 场景：数据库查询缓存

cache = LRUCache(1000)

def get_user(uid):
    result = cache.get(uid)
    if result != -1:
        return result

    result = db.query("SELECT * FROM users WHERE id=?", uid)
    cache.put(uid, result)
    return result

2. 场景：Web API 本地缓存

cache = LRUCache(500)

def get_weather(city):
    if (data := cache.get(city)) != -1:
        return data

    data = requests.get(f"https://api.weather.com/{city}").json()
    cache.put(city, data)
    return data

3. 场景：复杂计算缓存

cache = LRUCache(2000)

def heavy_compute(x):
    if (res := cache.get(x)) != -1:
        return res

    res = slow_function(x)
    cache.put(x, res)
    return res