《Advanced RAG》-09-Prompt 压缩（二）

最新推荐文章于 2025-08-31 09:13:15 发布

原创

最新推荐文章于 2025-08-31 09:13:15 发布 · 1.3k 阅读

23 ·

CC 4.0 BY-SA版权

文章标签：

#prompt #AIGC #人工智能 #语言模型

承接上文：《Advanced RAG》-09-Prompt 压缩（一）

LongLLMLingua

LLMLingua 的问题在于，它在压缩过程中不考虑用户的问题，可能会保留无关信息。

LongLLMLingua 将用户问题纳入压缩过程，旨在解决这一问题。

在这里插入图片描述

如图 9 所示，LongLLMLingua 提出了四个新组件，以增强对 LLM 中关键信息的感知：

问题感知的粗粒度和细粒度压缩
文件重新排序机制
动态压缩比
后续恢复算法

问题感知粗粒度压缩（Question-aware coarse-grained compression）

LongLLMLingua 建议使用问题 x^que 在不同上下文 x^doc_k 条件下的困惑度来表示它们之间的关联。在 x^que 后面可以加上一个限制性语句，即 x^restrict = "我们可以在给定的文档中得到这个问题的答案"。该语句加强了 x^que 和 x^doc_k 之间的联系，并作为一个正则化项减少了幻觉效应。这可以表述为

在这里插入图片描述

为什么不计算问题 x^que 条件下的文档级困惑度呢？

这是因为文档中往往包含大量无关信息。即使以 x^que相关代码可在函数 get_distance_longllmlingua 中找到。为条件，为整个文档计算的困惑度得分也可能不够明显，因此不足以作为文档级压缩的衡量标准。

相关代码可在函数 get_distance_longllmlingua 中找到。

问题感知细粒度压缩（Question-aware fine-grained compression）

LongLLMLingua 引入了对比困惑的概念。

在这里插入图片描述

首先，我们计算一个标记的困惑度，不考虑问题本身，表示为perplexity(x_i | x<i)。然后，我们再次测量困惑度，这次包括问题，表示为perplexity(x_i | x^que, x<i)。这衡量了在给定问题x^que的情况下，看到标记x_i之前所有标记的惊讶程度。

这样做的目的是确定每个标记的惊奇程度随问题变化的程度。如果一个词在包含问题时变得不那么令人惊讶，那么它可能与问题高度相关。

文件重新排序机制（Document reordering mechanism）

如图 10 所示，在推理过程中，LLM 往往会使用提示开头和结尾的内容，而忽略中间的内容。这个问题被称为 "迷失在中间 "问题。

在这里插入图片描述

图 10 还表明，当相关信息被放在开头时，LLM 的表现最佳。因此，LongLLMLingua 根据粗粒度压缩的结果来组织段落，按得分从高到低的顺序从前往后排列。

在这里插入图片描述

动态压缩比（Dynamic compression ratio）

由于不同文档的关键信息密度不同，我们应该为与问题更相关的文档分配更多的预算（即更低的压缩比）。

LongLLMLingua 使用粗粒度压缩的重要性分数来指导细粒度压缩的预算分配。

具体来说，首先使用 LLMLingua 的预算控制器为保留的文档设置初始预算。然后，在细粒度压缩阶段，为每个文档动态分配压缩预算。分配的依据是文档的重要性得分排名指数，该指数是在粗粒度压缩阶段确定的。

LongLLMLingua 采用线性调度器进行自适应分配，每个令牌 xi 的预算可表示为：

在这里插入图片描述

其中，Nd 表示文件数量，δτ 是一个超参数，用于控制动态分配的总体预算。

相应的代码可在函数 get_dynamic_compression_ratio 中找到。

后续恢复算法（Subsequence recovery algorithm）

如图 11 所示，在细粒度标记压缩过程中，可能会丢弃一些关键实体的标记。例如，原始提示中的 "2009 "可能被压缩为 “209”，"Wilhelm Conrad Rontgen "可能被压缩为 “Wilhelmgen”。

在这里插入图片描述

LongLLMLingua 提出了一种子序列恢复算法，可以从 LLM 的响应中恢复出原始内容，如图 12 所示。

在这里插入图片描述

主要流程包括以下步骤：

遍历 LLM 响应中的标记 yl，并选择压缩提示 x˜ 中出现的最长子串 y˜key,l
找出原始提示 x 中与 y˜key,l 相对应的最大公共最短子序列 xi,j
用 xi,j 替换 LLMs 响应中的相应标记 y˜key,l。

相应的代码可在恢复函数中找到。

代码演示

设置环境的方法与 LLMLingua 相同。下面是测试代码：

from llmlingua import PromptCompressor

GSM8K_PROMPT = "Question: Angelo and Melanie want to plan how many hours over the next week they should study together for their test next week. They have 2 chapters of their textbook to study and 4 worksheets to memorize. They figure out that they should dedicate 3 hours to each chapter of their textbook and 1.5 hours for each worksheet. If they plan to study no more than 4 hours each day, how many days should they plan to study total over the next week if they take a 10-minute break every hour, include 3 10-minute snack breaks each day, and 30 minutes for lunch each day?\nLet's think step by step\nAngelo and Melanie think they should dedicate 3 hours to each of the 2 chapters, 3 hours x 2 chapters = 6 hours total.\nFor the worksheets they plan to dedicate 1.5 hours for each worksheet, 1.5 hours x 4 worksheets = 6 hours total.\nAngelo and Melanie need to start with planning 12 hours to study, at 4 hours a day, 12 / 4 = 3 days.\nHowever, they need to include time for breaks and lunch. Every hour they want to include a 10-minute break, so 12 total hours x 10 minutes = 120 extra minutes for breaks.\nThey also want to include 3 10-minute snack breaks, 3 x 10 minutes = 30 minutes.\nAnd they want to include 30 minutes for lunch each day, so 120 minutes for breaks + 30 minutes for snack breaks + 30 minutes for lunch = 180 minutes, or 180 / 60 minutes per hour = 3 extra hours.\nSo Angelo and Melanie want to plan 12 hours to study + 3 hours of breaks = 15 hours total.\nThey want to study no more than 4 hours each day, 15 hours / 4 hours each day = 3.75\nThey will need to plan to study 4 days to allow for all the time