the Impact of Input Length on the Reasoning Performance of Large Language Models

本文研究了输入长度对大型语言模型(LLM)推理性能的影响。发现即使在远低于技术最大值的输入长度下,性能也会显著下降。通过FLenQA数据集,揭示了长度对推理性能的显著影响,且传统困惑度量不适用于长输入推理任务。研究确定了失败模式,为改进LLM提供了方向。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文是LLM系列文章,针对《Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models》的翻译。

摘要

本文探讨了扩展输入长度对大型语言模型(LLM)能力的影响。尽管LLM在最近取得了进步,但它们在不同输入长度上的性能一致性还没有得到很好的理解。我们通过引入一种新的QA推理框架来研究这一方面,该框架专门用于评估输入长度的影响。我们使用同一样本的多个版本来隔离输入长度的影响,每个版本都使用不同长度、类型和位置的填充进行扩展。我们的研究结果表明,在比技术最大值短得多的输入长度下,LLM的推理性能显著下降。我们表明,退化趋势出现在我们数据集的每个版本中,尽管强度不同。此外,我们的研究表明,传统的困惑度量与LLM在长输入推理任务中的表现并不相关。我们分析了我们的结果,并确定了故障模式,这些模式可以作为未来研究的有用指南,有可能为解决LLM中观察到的局限性的策略提供信息。

1 引言

2 所需数据属性

3 FLenQA

4 主要实验

5 与下一个单

已下架不支持订阅

### Chain-of-Thought Prompting Mechanism in Large Language Models In large language models, chain-of-thought prompting serves as a method to enhance reasoning capabilities by guiding the model through structured thought processes. This approach involves breaking down complex problems into simpler components and providing step-by-step guidance that mirrors human cognitive processing. The creation of these prompts typically includes selecting examples from training datasets where each example represents part of an overall problem-solving process[^2]. By decomposing tasks into multiple steps, this technique encourages deeper understanding and more accurate predictions compared to traditional methods. For instance, when faced with multi-hop question answering or logical deduction challenges, using such chains allows models not only to generate correct answers but also articulate intermediate thoughts leading up to those conclusions. Such transparency facilitates better interpretability while improving performance on various NLP benchmarks. ```python def create_chain_of_thought_prompt(task_description, examples): """ Creates a chain-of-thought prompt based on given task description and examples. Args: task_description (str): Description of the task at hand. examples (list): List containing tuples of input-output pairs used for demonstration purposes. Returns: str: Formatted string representing the final prompt including both instructions and sample cases. """ formatted_examples = "\n".join([f"Input: {ex[0]}, Output: {ex[1]}" for ex in examples]) return f""" Task: {task_description} Examples: {formatted_examples} Now try solving similar questions following above pattern. """ # Example usage examples = [ ("What color do you get mixing red and blue?", "Purple"), ("If it rains tomorrow, will we have our picnic?", "No") ] print(create_chain_of_thought_prompt("Solve logic puzzles", examples)) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UnknownBody

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值