RNNS ARE NOT TRANSFORMERS (YET): The Key Bottleneck on In-context Retrieval 翻译

最新推荐文章于 2025-11-27 14:26:48 发布

原创

最新推荐文章于 2025-11-27 14:26:48 发布 · 1k 阅读

25 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能

快速批量处理 PDF：Doc2X 为您服务
想要批量处理 PDF 转 Word、Latex 或 Markdown？Doc2X 提供高效的公式解析、表格识别、代码解析，支持深度翻译和大模型训练语料提取，为科研文档处理提速！
Fast Batch PDF Processing: Doc2X at Your Service
Need batch processing for PDF to Word, LaTeX, or Markdown? Doc2X delivers efficient formula parsing, table recognition, and code parsing, with support for advanced translation and large-model training corpus extraction, boosting research productivity!
👉 立即访问 Doc2X | Visit Doc2X Now

原文链接：https://arxiv.org/pdf/2402.18510

RNNS ARE NOT TRANSFORMERS (YET): The Key Bottleneck on In-context Retrieval

RNNS 还不是 TRANSFORMERS（目前）：上下文检索的关键瓶颈

Kaiyue Wen ${}^{1 * }$ Xingyu Dang ${}^{1 * }$ Kaifeng Lyu ${}^{2 \dagger }$

温凯越 ${}^{1 * }$ 党星宇 ${}^{1 * }$ 吕凯丰 ${}^{2 \dagger }$

${}^{1}$ Institute for Interdisciplinary Information Sciences,Tsinghua University

${}^{1}$ 清华大学交叉信息研究院

${}^{2}$ Department of Computer Science &Princeton Language and Intelligence,Princeton University

${}^{2}$ 普林斯顿大学计算机科学与普林斯顿语言与智能系

{wenky20,dangxy20}@mails.tsinghua.edu.cn

klyu@cs.princeton.edu

ABSTRACT

摘要

This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with ease. Conversely, we prove that adopting techniques to enhance the in-context retrieval capability of RNNs, including Retrieval-Augmented Generation (RAG) and adding a single Transformer layer, ca

最低0.47元/天解锁文章