LangChain教程——输出解析器

最新推荐文章于 2025-07-10 22:53:45 发布

原创最新推荐文章于 2025-07-10 22:53:45 发布 · 560 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #embedding #自然语言处理 #windows #langchain #RAG

上篇文章学习了LangChain教程——提示词模板，这篇文章我们学习LangChain教程——输出解析器。

输出解析器

输出解析器是负责将语言模型的原始输出解析为更结构化、更容易程序处理的格式，例如：转换为对象、JSON、数组等。

LangChain有许多不同的类型的输出解析器，如：

CommaSeparatedListOutputParser：CSV解析器，将模型输出解析为列表格式；
DatetimeOutputParser：日期时间输出解析器，将模型输出解析为日期时间格式；
JsonOutputParser：JSON输出解析器，将模型输出解析为JSON对象格式；
EnumOutputParser：枚举输出解析器，让模型从给定的选项列表中选择回答；

其主要作用是：

结构化输出：将非结构化的文本响应转换为结构化的数据格式；
数据验证：确保输出符合预期的格式和内容要求；
标准化：提供一致的输出格式，便于后续处理。

CSV输出解析器

CSV输出解析器可以将模型输出用逗号分隔，以列表的数据类型返回输出，示例代码如下：

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# 加载model
model = ChatOpenAI( openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1' )
# 定义列表输出解析器
output_parser = CommaSeparatedListOutputParser()
# 获取解析器的输出格式
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

使用CommaSeparatedListOutputParser方法定义列表输出解析器，通过输出解析器的get_format_instructions方法获取输出内容格式提示词，

运行结果如下：

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`

接下来我们将获取到的输出格式传递给提示词模板即可，示例代码如下：

# 定义提示词模板
prompt = PromptTemplate(
    template="请列出5个流行的{subject}品牌，不需要介绍。\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},  # 输出格式添加到提示词模板中
)

# Chain链
chain = prompt | model | output_parser

content=chain.invoke({"subject": "手机"})
print(content)
print(type(content))

运行结果如下：

['Apple', 'Samsung', 'Xiaomi', 'Oppo', 'Huawei']
<class 'list'>

日期时间输出解析器

日期时间解析器（DatetimeOutputParser）将可以自然语言描述的日期和时间转换为标准的datetime对象或特定格式的字符串。

例如：

人类可读的日期时间描述；
支持相对时间表达，例如明天、两周后；
处理模糊或不完整的日期时间信息；

示例代码如下：

from langchain.output_parsers import DatetimeOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# 加载model
model = ChatOpenAI(openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1')
# 定义日期时间输出解析器
output_parser = DatetimeOutputParser()
# 获取解析器的输出格式
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

和CSV输出解析器相似，也是先定义日期时间输出解析器，通过输出解析器的get_format_instructions方法获取输出格式，运行结果如下：

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0924-03-28T03:14:41.802616Z, 0772-03-12T20:06:35.900230Z, 1011-02-04T11:29:25.282845Z

Return ONLY this string, no other words!

在将获取输出格式传递给提示词模板，示例代码如下：

运行结果如下：

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0924-03-28T03:14:41.802616Z, 0772-03-12T20:06:35.900230Z, 1011-02-04T11:29:25.282845Z

Return ONLY this string, no other words!
<class 'datetime.datetime'>
1914-07-28 00:00:00

JSON输出解析器

JSON输出解析器（JsonOutputParser）可以将模型输出的非结构化文本转为结构化的JSON格式数据。

基础JSON输出

JSON输出解析器基础的用法如下：

from langchain_core.output_parsers import JsonOutputParser
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# 加载model
model = ChatOpenAI(openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1')

# 定义JSON输出解析器
output_parser = JsonOutputParser()
# 获取解析器的输出格式
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

运行结果如下：

eturn a JSON object.

获取输出格式并将其传递给提示词模板，示例代码如下：

# 定义提示词模板
prompt = PromptTemplate(
    template="请介绍5本{subject}热销的书籍，不需要介绍。\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},  # 输出格式添加到提示词模板中
)
# Chain链
chain = prompt | model | output_parser

content=chain.invoke({"subject": "中国"})
print(content)
print(type(content))

运行结果如下：

{'books': [{'title': '活着', 'author': '余华'}, {'title': '平凡的世界', 'author': '路遥'}, {'title': '鬼吹灯', 'author': '天下霸唱'}, {'title': '三体', 'author': '刘慈欣'}, {'title': '红楼梦', 'author': '曹雪芹'}]}
<class 'dict'>

Pydantic声明数据模型

我们可以通过Pydantic来声明数据模型，决定输出的JSON中有哪些字段，示例代码如下：

from typing import List
from langchain_core.output_parsers import JsonOutputParser
from langchain.prompts import PromptTemplate
from pydantic.v1 import BaseModel,Field
from langchain_openai import ChatOpenAI

# 加载model
model = ChatOpenAI(openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1')

# 定义书籍信息类
class BookInfo(BaseModel):
    title:str =Field(description="书名")
    author:str=Field(description="作者")
    Type:str=Field(description="书的类型")
    
# 嵌套书籍信息类BookInfo
class Book(BaseModel):
    books:List[BookInfo]

# 定义JSON输出解析器
output_parser = JsonOutputParser(pydantic_object=Book)
# 获取解析器的输出格式
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

这里我们定义了书籍类BookInfo和Book，并将Book传递给JSON输出解析器，运行结果如下：

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"books": {"title": "Books", "type": "array", "items": {"$ref": "#/definitions/BookInfo"}}}, "required": ["books"], "definitions": {"BookInfo": {"title": "BookInfo", "type": "object", "properties": {"title": {"title": "Title", "description": "书名", "type": "string"}, "author": {"title": "Author", "description": "作者", "type": "string"}, "Type": {"title": "Type", "description": "书的类型", "type": "string"}}, "required": ["title", "author", "Type"]}}}
```

再将输出格式传递到提示词模板，示例代码如下：

# 定义提示词模板
prompt = PromptTemplate(
    template="请介绍5本{subject}热销的书籍，不需要介绍。\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},  # 期望得到相应结果格式添加到提示词模板中
)
# Chain链
chain = prompt | model | output_parser

content=chain.invoke({"subject": "中国"})
print(content)
print(type(content))

运行结果如下：

{'books': [{'title': '活着', 'author': '余华', 'Type': '小说'}, {'title': '平凡的世界', 'author': '路遥', 'Type': '小说'}, {'title': '百年孤独', 'author': '加西亚·马尔克斯', 'Type': '小说'}, {'title': '鬼吹灯', 'author': '天下霸唱', 'Type': '网络文学'}, {'title': '三体', 'author': '刘慈欣', 'Type': '科幻'}]}
<class 'dict'>

枚举输出解析器

枚举输出解析器（EnumOutputParser）可以让模型从给定的选项列表中选择回答，类似做选择题一样。

示例代码如下：

from langchain.output_parsers.enum import EnumOutputParser
from enum import Enum
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# 加载model
model = ChatOpenAI(openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1')

# 定义枚举
class Colors(Enum):
    RED="红色"
    GREEN = "绿色"
    BLUE = "蓝色"
    WHITE = "黄色"
# 定义枚举输出解析器
output_parser = EnumOutputParser(enum=Colors)
# 获取解析器的输出格式
format_instructions=output_parser.get_format_instructions()
print(format_instructions)

这里我们定义了枚举类Colors，并将其传递给了枚输出解析器，运行结果如下：

Select one of the following options: 红色, 绿色, 蓝色, 黄色

再将枚举输出格式传递到提示词模板，示例代码如下：

# 定义提示词模板
prompt = PromptTemplate(
    template="从给定选项中选择最合适的答案。\n{format_instructions}\n问题: {subject}\n回答:",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},  # 期望得到相应结果格式添加到提示词模板中
)
# Chain链
chain = prompt | model | output_parser

content = chain.invoke({"subject": "请问天空通常是什么颜色"})
print(content)
print(type(content))

运行结果如下：

Colors.BLUE
<enum 'Colors'>

注意：枚举数据需要尽可能多，否则如果枚举数据中没有答案，那么会报错

langchain_core.exceptions.OutputParserException: Response '×× (虽然选项里没有，但这是正确的答案)

自定义输出解析器

自定义输出解析器用于将语言模型的原始输出转换为更适合应用程序使用的结构化格式，简单来说，就是让模型输出根据我们自己设置的函数来输出结果。

示例代码如下：

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# 加载model
model = ChatOpenAI( openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1' )

# 定义函数获取输出的字符串长度
def get_content_lenght(content:str)->int:
    return len(content.content)

# 定义提示词模板
prompt = PromptTemplate(
    template="请列出5个流行的{subject}品牌，不需要介绍。",
    input_variables=["subject"],
)

# Chain链
chain = prompt | model | get_content_lenght
content=chain.invoke({"subject": "手机"})
print(content)

首先我们定义了一个名为get_content_lenght的函数，然后把该函数放在Chain链里面，运行结果如下：

修复输出解析器

修复输出解析器（OutputFixingParser）用于处理初始输出解析失败的时候修复逻辑，主要是在输出不符合预期格式时提供自动修复机制。

示例代码如下：

from typing import List
from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser,OutputFixingParser
from langchain_openai import ChatOpenAI

model = ChatOpenAI(openai_api_key="google/gemma-3-12b", openai_api_base='http://127.0.0.1:1234/v1')

class Actor(BaseModel):
    name:str=Field(description="演员的名字")
    film_names:List[str]=Field(description="他们主演的电影名称列表")

actor_query="生成随机演员的电影作品表。"

parser=PydanticOutputParser(pydantic_object=Actor)

# 假设的输出结果
misformatted="{'name':'Tom Hanks','film_names':['Forrest Gump']"
# 格式化输出解析
print(parser.parse(misformatted))

运行后发现报错：

这时我们可以使用OutputFixingParser来修复错误：

# 根据输出解析器parser和模型定义修复输出解析器
new_parser=OutputFixingParser.from_llm(parser=parser,llm=model)
# 修复解析错误
content=new_parser.parse(misformatted)
print(type(content))
print(content)

运行结果如下：

<class '__main__.Actor'>
name='Tom Hanks' film_names=['Forrest Gump']

好了，LangChain教程——输出解析器就讲到这里了，下一篇我们学习LangChain教程——Chain链。

我们该怎样系统的去转行学习大模型？

很多想入行大模型的人苦于现在网上的大模型老课程老教材，学也不是不学也不是，基于此，我用做产品的心态来打磨这份大模型教程，深挖痛点并持续修改了近100余次后，终于把整个AI大模型的学习门槛，降到了最低！

在这个版本当中：

第一您不需要具备任何算法和数学的基础
第二不要求准备高配置的电脑
第三不必懂Python等任何编程语言

您只需要听我讲，跟着我做即可，为了让学习的道路变得更简单，这份大模型教程已经给大家整理并打包，现在将这份 LLM大模型资料 分享出来： 😝有需要的小伙伴，可以 扫描下方二维码领取🆓↓↓↓

一、大模型经典书籍（免费分享）

AI大模型已经成为了当今科技领域的一大热点，那以下这些大模型书籍就是非常不错的学习资源。

二、640套大模型报告（免费分享）

这套包含640份报告的合集，涵盖了大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师，还是对AI大模型感兴趣的爱好者，这套报告合集都将为您提供宝贵的信息和启示。(几乎涵盖所有行业)
在这里插入图片描述

三、大模型系列视频教程（免费分享）

在这里插入图片描述

四、2025最新大模型学习路线（免费分享）

我们把学习路线分成L1到L4四个阶段，一步步带你从入门到进阶，从理论到实战。

L1阶段:启航篇丨极速破界AI新时代

L1阶段：我们会去了解大模型的基础知识，以及大模型在各个行业的应用和分析；学习理解大模型的核心原理、关键技术以及大模型应用场景。

L2阶段：攻坚篇丨RAG开发实战工坊

L2阶段是我们的AI大模型RAG应用开发工程，我们会去学习RAG检索增强生成：包括Naive RAG、Advanced-RAG以及RAG性能评估，还有GraphRAG在内的多个RAG热门项目的分析。

L3阶段：跃迁篇丨Agent智能体架构设计

L3阶段：大模型Agent应用架构进阶实现，我们会去学习LangChain、 LIamaIndex框架，也会学习到AutoGPT、 MetaGPT等多Agent系统，打造我们自己的Agent智能体。

L4阶段：精进篇丨模型微调与私有化部署

L4阶段：大模型的微调和私有化部署，我们会更加深入的探讨Transformer架构，学习大模型的微调技术，利用DeepSpeed、Lamam Factory等工具快速进行模型微调。

L5阶段：专题集丨特训篇【录播课】

全套的AI大模型学习资源已经整理打包，有需要的小伙伴可以微信扫描下方二维码，免费领取

LangChain教程——输出解析器

输出解析器

CSV输出解析器

日期时间输出解析器

JSON输出解析器

基础JSON输出

Pydantic声明数据模型

枚举输出解析器

自定义输出解析器

修复输出解析器

我们该怎样系统的去转行学习大模型 ？

一、大模型经典书籍（免费分享）

二、640套大模型报告（免费分享）

三、大模型系列视频教程（免费分享）

四、2025最新大模型学习路线（免费分享）

我们把学习路线分成L1到L4四个阶段，一步步带你从入门到进阶，从理论到实战。

L1阶段:启航篇丨极速破界AI新时代

L2阶段：攻坚篇丨RAG开发实战工坊

L3阶段：跃迁篇丨Agent智能体架构设计

L4阶段：精进篇丨模型微调与私有化部署

L5阶段：专题集丨特训篇 【录播课】

我们该怎样系统的去转行学习大模型？

L5阶段：专题集丨特训篇【录播课】