揭秘大语言模型：深入解析’幻觉’产生的本质！

最新推荐文章于 2025-11-23 19:13:18 发布

原创最新推荐文章于 2025-11-23 19:13:18 发布 · 938 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#语言模型 #人工智能 #知识图谱 #智能体 #程序员 #大模型 #大模型教程

人生天地之间，无终始者非君子也。

我问了chatgpt这一句话。

gpt-5 chat的回答，是不是看起来很厉害？🤣

然而，这里面有不对劲的地方：“白驹过隙”我们学过，似乎没有“无终始者，凡物也；无始终者，非君子也”这句话啊。

就算记不得“白驹过隙”，逻辑上也不对劲啊，不应该是“无始终者，君子也”？

gpt-5强词夺理，继续编！

gpt-4.5的价格是gpt-5的50倍，海量参数，计算成本高这么多，会不会更好？

看起来仍然不明觉厉。但一样的幻觉，一样的瞎编。庄子的原文里根本没这个！

真相其实很简单，这句话也很好理解，没有gpt说的这么玄乎：

你最喜欢三国演义中的哪句话？ - 知乎

这是llm“幻觉”的一个真实案例。

有了具体案例（what），下面我们可以开始讨论 why、llm幻觉的本质了。

一本正经胡说八道

信心满满张嘴就来、一本正经胡说八道，人类如此嘲讽llm的“幻觉” (Hallucination) 。

（其实，人类自己在这方面也相当不差🤣）

作为llm使用者（以及有大量学习经验的人类学习者），我们可以根据实际经验，把 llm 幻觉大致分为两种：

一种是预训练阶段，llm学习并记住了错误的知识，然后把错误信息当作正确知识，用于回答中（人也一样）；
第二种是guessing/“战略性猜测”，而不是直接说“我不知道”，或者给出多种可能性的不确定回答（人也一样）。

openai 分享了对llm幻觉的研究。llm 幻觉的根本原因，不在模型本身，而在两个地方：

一个是llm的预训练阶段，数据局限性导致的幻觉；llm通过学习海量文本来“预测下一个词”，对于有规律的模式（语法、编程、通用知识），llm可以学得很好；但是，对于低频的事实（例如，路人甲xxx的宠物狗的生日），这些信息在训练数据中无规律可循，导致了llm的第一种幻觉；
另一个是以准确率得分为主的模型评价机制（accuracy-based evals），错误地奖励 llm的“战略性猜测”/瞎猜行为，错误地惩罚llm 放弃回答/“我不知道”的行为，这导致了第二种幻觉；

gpt-5 的一大提升，就是在消除llm幻觉上。尤其是gpt-5 thinking，学会了“谦卑”，某种意义上体现了某种程度的“元认知”：减少了 “战略性猜测”行为，而选择“放弃”，直接说自己不知道。

openai的这篇文章很有趣，虽然讨论的是“老生常谈”的问题，但是 如果能从根本原因上理解一个常见现象，那么价值巨大。所以我全文翻译，做成中英对照版本，还加入了我的阅读评论，欢迎阅读～

为什么大语言模型会产生“幻觉”？

title: Why language models hallucinate

by: openai

date: September 5, 2025

At OpenAI, we’re working hard to make AI systems more useful and reliable. Even as language models become more capable, one challenge remains stubbornly hard to fully solve: hallucinations. By this we mean instances where a model confidently generates an answer that isn’t true. Our new research paper argues that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.

在 OpenAI，我们正努力让 AI 系统变得更加有用和可靠。即使语言模型变得越来越强大，有一个难题依然顽固难以彻底解决：幻觉现象。所谓幻觉，指的是模型自信地生成了一个并不真实的答案。我们的最新研究论文提出，语言模型之所以会产生幻觉，是因为标准的训练和评估流程更鼓励模型去猜测，而不是承认自己不确定。

ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations especially when reasoning, but they still occur. Hallucinations remain a fundamental challenge for all large language models, but we are working hard to further reduce them.

ChatGPT 也会出现幻觉。GPT‑5 出现幻觉的频率明显降低，尤其是在推理时更是如此，但幻觉仍然会发生。对于所有大型语言模型来说，幻觉仍然是一个重大挑战，不过我们正努力进一步减少这种情况。

howie：一个小贴士，使用gpt-5 thinking，基本可以解决幻觉问题。因为模型几乎一定会先搜索一下，验证后再回答。

什么是幻觉？

What are hallucinations?

Hallucinations are plausible but false statements generated by language models. They can show up in surprising ways, even for seemingly straightforward questions. For example, when we asked a widely used chatbot for the title of the PhD dissertation by Adam Tauman Kalai (an author of this paper), it confidently produced three different answers—none of them correct. When we asked for his birthday, it gave three different dates, likewise all wrong.

幻觉是语言模型生成的看似可信但实际上错误的陈述。它们可能以令人吃惊的方式出现，甚至在看似非常简单的问题上也会发生。举个例子：我们曾询问一个广泛使用的聊天机器人，Adam Tauman Kalai（本论文的一位作者）的博士论文题目是什么。结果机器人非常自信地给出了三个不同的答案——没有一个是正确的。当我们问他的生日时，它又给出了三个不同的日期，同样全都答错了。

应试倾向

Teaching to the test

Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.

幻觉之所以挥之不去，部分原因在于当前的评估方法设置了错误的激励机制。评估本身并不会直接导致幻觉，但大多数评估衡量模型表现的方式会鼓励模型去猜测，而不是如实地表明不确定性。

Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy—the percentage of questions they get exactly right—they are encouraged to guess rather than say “I don’t know.”

可以用选择题考试来打比方。如果你不知道答案却随便蒙一个，运气好的话可能正好答对；而空着不答则肯定是 0 分。类似地，当模型只根据准确率（即答对问题的比例）来评分时，它就更倾向于猜测，而不是老实地回答“我不知道”。

howie: 应试教育中有很多得分技巧。如果一个人凭借自己精通得分技巧，在考试中一直拿高分，最终上了北大，出任总经理，登上人生巅峰。那他在现实世界的幻觉率应该不低。

很多ai的benchmark得分很高，实际表现一般，就是这个原因。

As another example, suppose a language model is asked for someone’s birthday but doesn’t know. If it guesses “September 10,” it has a 1-in-365 chance of being right. Saying “I don’t know” guarantees zero points. Over thousands of test questions, the guessing model ends up looking better on scoreboards than a careful model that admits uncertainty.

再举个例子，假设一个语言模型被问到某人的生日，但它并不知道正确答案。如果它猜“9 月 10 日”，那么猜对的概率是 1/365；而回答“我不知道”则肯定得 0 分。在成千上万道测试题中，爱猜的模型最终在排行榜上的成绩会比谨慎承认不确定性的模型更好看。

For questions where there is a single “right answer,” one can consider three categories of responses: accurate responses, errors, and abstentions where the model does not hazard a guess. Abstaining is part of humility, one of OpenAI’s core values. Most scoreboards prioritize and rank models based on accuracy, but errors are worse than abstentions. Our Model Spec states that it is better to indicate uncertainty or ask for clarification than provide confident information that may be incorrect.

对于只有唯一正确答案的问题，可以将模型的回答分为三类：正确、错误，以及放弃作答（模型不贸然猜测）。选择不回答体现了一种谦逊态度，而谦逊是 OpenAI 的核心价值观之一。多数排行榜根据准确率对模型进行排名，但实际上错误回答比放弃作答更糟糕。我们的 Model Spec（模型规范）指出，与其自信地给出可能错误的信息，最好表明不确定性或请求澄清。

For a concrete example, consider the SimpleQA eval as an example from the GPT-5 System Card.

举例来说，可以参考 GPT-5 System Card 文档中的一个示例：SimpleQA 评测。

指标	gpt-5-thinking-mini	OpenAI o4-mini
弃答率（未给出具体答案）	52%	1%
准确率（回答正确，越高越好）	22%	24%
错误率（回答错误，越低越好）	26%	75%
总计	100%	100%

howie: gpt-5 thinking mini的正确率略微低于o4 mini，但是错误率显著低于后者。这是一个进步，体现在模型“知道自己不知道”。

In terms of accuracy, the older OpenAI o4-mini model performs slightly better. However, its error rate (i.e., rate of hallucination) is significantly higher. Strategically guessing when uncertain improves accuracy but increases errors and hallucinations.

从准确率来看，较旧的 OpenAI o4-mini 模型表现略好。然而，它的错误率（也就是幻觉发生率）高得多。在不确定时进行策略性猜测可以提高准确率，但也会增加错误和幻觉。

When averaging results across dozens of evaluations, most benchmarks pluck out the accuracy metric, but this entails a false dichotomy between right and wrong. On simplistic evals like SimpleQA, some models achieve near 100% accuracy and thereby eliminate hallucinations. However, on more challenging evaluations and in real use, accuracy is capped below 100% because there are some questions whose answer cannot be determined for a variety of reasons such as unavailable information, limited thinking abilities of small models, or ambiguities that need to be clarified.

在对数十项评测结果取平均时，大多数基准测试只看准确率这一指标，但这实际上造成了对与错之间的虚假二分。在像 SimpleQA 这样简单的评测中，一些模型可以达到接近 100% 的准确率，从而几乎不出现幻觉。然而，在更具挑战性的评测以及实际使用中，准确率不可能达到 100%，因为有些问题由于各种原因无法确定答案，例如信息不可获、较小模型的思维能力有限，或问题本身存在需要澄清的模糊之处。

Nonetheless, accuracy-only scoreboards dominate leaderboards and model cards, motivating developers to build models that guess rather than hold back. That is one reason why, even as models get more advanced, they can still hallucinate, confidently giving wrong answers instead of acknowledging uncertainty.

尽管如此，只以准确率为标准的评分机制依然主导着模型排行榜和模型卡片，这驱使开发者倾向于打造那些在不确定情况下宁可猜也不愿沉默的模型。这也是为什么即便模型越来越先进，它们仍然会产生幻觉：在不确定时，模型宁可自信地给出错误答案，也不愿承认自己不知道。

更优的评测评分方式

A better way to grade evaluations

There is a straightforward fix. Penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty. This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing. Several research groups have also explored evaluations that account for uncertainty and calibration.

其实有一个直接的解决方法。对于过于自信却答错的情况，比对于不确定的回答扣更多分，并且对恰当表达不确定性的情况给予部分分数。这并不是什么新想法。一些标准化考试早就采取类似措施：对错误答案进行扣分，或者对空白未答给予部分分，以此来阻止盲目猜测。也有一些研究团队探索过在评估中纳入对不确定性和校准的考量。

人类考试是不是也应该如此？

Our point is different. It is not enough to add a few new uncertainty-aware tests on the side. The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess. Fixing scoreboards can broaden adoption of hallucination-reduction techniques, both newly developed and those from prior research.

但我们的侧重点不同。仅仅额外增加几项考虑不确定性的测试还不够。那些广泛使用的、基于准确率的评测需要更新其计分方式，以便不再助长模型盲目猜测。如果主流的排行榜继续奖励侥幸猜对的情况，模型就会继续学着去猜。修正这些评分机制可以扩大减少幻觉技术的采用范围——无论是新开发的方法还是以往研究中的成果。

幻觉是如何从下一词预测中产生的

How hallucinations originate from next-word prediction

We’ve talked about why hallucinations are so hard to get rid of, but where do these highly-specific factual inaccuracies come from in the first place? After all, large pretrained models rarely exhibit other kinds of errors such as spelling mistakes and mismatched parentheses. The difference has to do with what kinds of patterns there are in the data.

我们已经谈过为什么幻觉如此难以消除，但这些非常具体的事实错误最初是从何而来的呢？毕竟，大型预训练模型很少犯拼写错误或括号不匹配之类的错误。这种差异归根结底在于数据中存在什么样的模式。

Language models first learn through pretraining, a process of predicting the next word in huge amounts of text. Unlike traditional machine learning problems, there are no “true/false” labels attached to each statement. The model sees only positive examples of fluent language and must approximate the overall distribution.

语言模型首先通过预训练阶段来学习——也就是在海量文本中预测下一个词。不像传统的机器学习任务，这里的每句话都没有贴上“真/假”的标签。模型只能看到语言流畅的正面示例，并且必须据此近似整个语言分布。

It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.

在完全没有被标记为“无效”的示例时，要区分有效陈述和无效陈述就更是难上加难。不过，即使有了标签，某些错误仍然无法避免。为什么会这样呢？我们可以考虑一个更简单的类比：在图像识别中，如果数百万张猫和狗的照片被标注为“猫”或“狗”，算法就能可靠地学会将它们分类。但试想如果改为给每张宠物照片标注宠物的生日。由于生日基本上是随机的，无论算法多么先进，这个任务总会产生错误。

The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.

同样的原理也适用于预训练阶段。像拼写和括号配对这样遵循一致模式的内容，随着规模增大错误就消失了。但一些任意的低频事实（比如宠物的生日）无法仅凭模式来预测，因此就会导致幻觉的产生。我们的分析解释了哪些类型的幻觉会源自下一词预测。理想情况下，预训练后的进一步阶段应该能够消除它们，但由于上一节所述的原因，这一目标尚未完全实现。

结论

Conclusions

We hope that the statistical lens in our paper clarifies the nature of hallucinations and pushes back on common misconceptions:

我们希望我们论文中的统计视角能够阐明幻觉的本质，并澄清一些常见的误解：

Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates.
Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.
断言：通过提高准确率可以消除幻觉，因为一个 100% 准确的模型永远不会产生幻觉。❌
发现：**准确率永远不可能达到 100%**。无论模型规模多大、搜索和推理能力多强，总有一些现实问题的答案从根本上来说是无解的。✅
Claim: Hallucinations are inevitable.
Finding: They are not, because language models can abstain when uncertain.
断言：幻觉是不可避免的。❌
发现：并非如此，因为语言模型在不确定时可以选择不作答。✅
Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models.
Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.
断言：避免幻觉需要一定程度的智能，而这种智能只有更大的模型才能实现。 ❌
发现：对于小模型而言，认识到自己的局限反而更容易。举例来说，当被要求回答一个毛利语（Māori）问题时，一个完全不懂毛利语的小模型可以直接回答“我不知道”，而懂一些毛利语的模型则必须先判断自己有多大把握。正如论文中讨论的，“校准”（指模型评估自身确定性的能力）所需的计算量远远小于追求绝对准确所需的计算量。✅
Claim: Hallucinations are a mysterious glitch in modern language models.
Finding: We understand the statistical mechanisms through which hallucinations arise and are rewarded in evaluations.
断言：幻觉是现代语言模型中一种神秘的故障。 ❌
发现：我们已经理解了幻觉产生的统计机制，以及为什么它们会在评估中被“奖励”。✅
Claim: To measure hallucinations, we just need a good hallucination eval.
Finding: Hallucination evals have been published. However, a good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.
断言：要评估幻觉，我们只需要一个好的幻觉评测。❌
发现：针对幻觉的评测方案已经发布。然而，在面对上百种传统的基于准确率的评测时，再好的幻觉评测也几乎起不了作用——这些传统评测会惩罚谦逊、奖励乱猜。因此，所有主要的评测指标都需要重新设计，以奖励表达不确定性的行为。 ✅

Our latest models have lower hallucination rates, and we continue to work hard to further decrease the rates of confident errors output by our language models.

我们最新的模型已经将幻觉发生率降得更低，我们也将继续努力，进一步减少语言模型自信输出错误答案的情况。

启示：与我何干？

我前面费这么大劲，整理分享了ai幻觉的实例、思考，和openai双语对照文章，你费这么大劲，读了这么多字……现在，我们想一想，ai幻觉，对我们到底有什么启示？有什么有趣的东西是你值得和朋友、和家里小学生聊的？

首先，不要把幻觉污名化，不要闻之色变。 hallucination 是智能体的必然。大脑不是复印机，不是静态的知识仓库，所有的记忆和知识，都是主动构建的产物。不存在绝对的正确和准确，也没有必要追求绝对没幻觉。幻觉不是多大问题，通过搜索、使用工具，可以解决。

关键是要思考。 开头的案例中，你脑子里已有的“白驹过隙”等知识储备、你的逻辑推理、google和互联网检索，这些都可以帮你轻松识别ai 幻觉。对于主动思考的人，幻觉不是问题；对于放弃思考的人，幻觉才是问题。

google 一下。在chatgpt之前，习惯主动google，主动检索、获取并理解信息的人，大部分的幻觉都不是问题。有搜索习惯的人，在chatgpt之后，也不会被幻觉坑死。

最可怕的情况：是不思考、不验证的人，用ai疯狂生产垃圾信息，然后💩淹互联网，而通过互联网被动接受信息的人，也放弃主动判断，放弃搜索，放弃理解和验证，这样，💩一样的垃圾信息就会淹没互联网。长期以往，验证信息真伪就会愈发困难。

归根结底，这些都指向一个朴素的常识：

在ai时代，记忆、理解、知识、阅读、思考、写作……这些不是不重要了，反而是更重要了。

在ai可以生成几乎一切内容的时候，内在于人的、人自身的判断力、品味愈发重要。

但是，你无法利用外部信息（不论是纯粹在书本上，还是网盘上，还是llm里）来进行思考和判断。

归根结底，学习，越来越重要了。🤣

AI大模型从0到精通全套学习大礼包

我在一线互联网企业工作十余年里，指导过不少同行后辈。帮助很多人得到了学习和成长。

只要你是真心想学AI大模型，我这份资料就可以无偿共享给你学习。大模型行业确实也需要更多的有志之士加入进来，我也真心希望帮助大家学好这门技术，如果日后有什么学习上的问题，欢迎找我交流，有技术上面的问题，我是很愿意去帮助大家的！

如果你也想通过学大模型技术去帮助就业和转行，可以点扫描下方👇👇
大模型重磅福利：入门进阶全套104G学习资源包免费分享！
在这里插入图片描述

01.从入门到精通的全套视频教程

包含提示词工程、RAG、Agent等技术点
在这里插入图片描述

02.AI大模型学习路线图（还有视频解说）

全过程AI大模型学习路线

在这里插入图片描述

03.学习电子书籍和技术文档

市面上的大模型书籍确实太多了，这些是我精选出来的

在这里插入图片描述

04.大模型面试题目详解

在这里插入图片描述

05.这些资料真的有用吗?

这份资料由我和鲁为民博士共同整理，鲁为民博士先后获得了北京清华大学学士和美国加州理工学院博士学位，在包括IEEE Transactions等学术期刊和诸多国际会议上发表了超过50篇学术论文、取得了多项美国和中国发明专利，同时还斩获了吴文俊人工智能科学技术奖。目前我正在和鲁博士共同进行人工智能的研究。

所有的视频由智泊AI老师录制，且资料与智泊AI共享，相互补充。这份学习大礼包应该算是现在最全面的大模型学习资料了。

资料内容涵盖了从入门到进阶的各类视频教程和实战项目，无论你是小白还是有些技术基础的，这份资料都绝对能帮助你提升薪资待遇，转行大模型岗位。

在这里插入图片描述