langchain教程-3.OutputParser/输出解析

前言

该系列教程的代码: https://github.com/shar-pen/Langchain-MiniTutorial

我主要参考 langchain 官方教程, 有选择性的记录了一下学习内容

这是教程清单

PydanticOutputParser

PydanticOutputParser 是一个用于将语言模型的输出转换为结构化信息的类。它能够提供清晰且有组织的格式化信息,而不仅仅是简单的文本响应。

通过使用此类,您可以将语言模型的输出转换为特定的数据模型,使其更易于处理和利用。


主要方法

PydanticOutputParser 主要依赖于两个核心方法

1. get_format_instructions()

  • 提供指令,定义语言模型应输出的数据格式。
  • 例如,可以返回一个字符串,其中描述了数据字段及其格式要求。
  • 这些指令对于让语言模型结构化输出符合特定数据模型至关重要。

2. parse()

  • 接收语言模型的输出(通常是字符串),并将其解析和转换为特定的数据结构。
  • 使用 Pydantic 进行数据验证,将输入字符串与预定义的模式匹配,并转换为符合该模式的数据结构。
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)

以下是一个使用 Parser 简化流程的示例

email_conversation = """
From: John (John@bikecorporation.me)
To: Kim (Kim@teddyinternational.me)
Subject: “ZENESIS” bike distribution cooperation and meeting schedule proposal
Dear Mr. Kim,

I am John, Senior Executive Director at Bike Corporation. I recently learned about your new bicycle model, "ZENESIS," through your press release. Bike Corporation is a company that leads innovation and quality in the field of bicycle manufacturing and distribution, with long-time experience and expertise in this field.

We would like to request a detailed brochure for the ZENESIS model. In particular, we need information on technical specifications, battery performance, and design aspects. This information will help us further refine our proposed distribution strategy and marketing plan.

Additionally, to discuss the possibilities for collaboration in more detail, I propose a meeting next Tuesday, January 15th, at 10:00 AM. Would it be possible to meet at your office to have this discussion?

Thank you.

Best regards,
John
Senior Executive Director
Bike Corporation
"""
from itertools import chain
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import AIMessageChunk
from langchain_core.output_parsers import StrOutputParser

prompt = PromptTemplate.from_template(
    "Please extract the important parts of the following email.\n\n{email_conversation}"
)

chain = prompt | llm | StrOutputParser()

answer = chain.stream({
   
   "email_conversation": email_conversation})


#  A function for real-time output (streaming)
def stream_response(response, return_output=False):
    """
    Streams the response from the AI model, processing and printing each chunk.

    This function iterates over each item in the 'response' iterable. If an item is an instance of AIMessageChunk, it extracts and prints the content.
    If the item is a string, it prints the string directly.
    Optionally, the function can return the concatenated string of all response chunks.

    Args:
    - response (iterable): An iterable of response chunks, which can be AIMessageChunk objects or strings.
    - return_output (bool, optional): If True, the function returns the concatenated response string. The default is False.

    Returns:
    - str: If `return_output` is True, the concatenated response string. Otherwise, nothing is returned.
    """
    answer = ""
    for token in response:
        if isinstance(token, AIMessageChunk):
            answer += token.content
            print(token.content, end="", flush=True)
        elif isinstance(token, str):
            answer += token
            print(token, end="", flush=True)
    if return_output:
        return answer


output = stream_response(answer, return_output=True)
### Important Parts of the Email:

- **From:** John (John@bikecorporation.me)
- **To:** Kim (Kim@teddyinternational.me)
- **Subject:** "ZENESIS" bike distribution cooperation and meeting schedule proposal

- **Key Points:**
  - John is the Senior Executive Director at Bike Corporation.
  - He learned about the "ZENESIS" bicycle model through a press release.
  - Bike Corporation is a leading company in bicycle manufacturing and distribution.
  - They are requesting a detailed brochure for the ZENESIS model, specifically needing information on technical specifications, battery performance, and design aspects.
  - A meeting is proposed for Tuesday, January 15th, at 10:00 AM at Kim's office to discuss collaboration possibilities in more detail.

- **Proposed Meeting:**
  - Date: Tuesday, January 15th
  - Time: 10:00 AM
  - Location: Kim's office

- **Purpose:**
  - To discuss the possibilities for collaboration and further refine the distribution strategy and marketing plan for the ZENESIS model.

当不使用 output parser(PydanticOutputParser) 时,需要对数据类型和访问方式自定义

answer = chain.invoke({
   
   "email_conversation": email_conversation})
print(answer)
### Important Parts of the Email:

- **From:** John (John@bikecorporation.me)
- **To:** Kim (Kim@teddyinternational.me)
- **Subject:** "ZENESIS" bike distribution cooperation and meeting schedule proposal

- **Key Points:**
  - John is the Senior Executive Director at Bike Corporation.
  - He learned about the "ZENESIS" bicycle model through a press release.
  - Bike Corporation is a leading company in bicycle manufacturing and distribution.
  - They are requesting a detailed brochure for the ZENESIS model, specifically needing information on technical specifications, battery performance, and design aspects.
  - A meeting is proposed for Tuesday, January 15th, at 10:00 AM at Kim's office to discuss collaboration possibilities in more detail.

- **Proposed Meeting:**
  - Date: Tuesday, January 15th
  - Time: 10:00 AM
  - Location: Kim's office

- **Follow-Up:**
  - John requests a detailed brochure for the ZENESIS model.
  - He is interested in discussing potential distribution and marketing strategies.

使用 PydanticOutputParser

当提供类似上述的电子邮件内容时,我们将使用以下以 Pydantic 风格定义的类来解析邮件信息。

作为参考, Field 内的 description 用于指导从文本响应中提取关键信息。LLM 依赖此描述来提取所需信息。因此,确保该描述准确且清晰至关重要。

from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

class EmailSummary(BaseModel):
    person: str = Field(description="The sender of the email")
    email: str = Field(description="The email address of the sender")
    subject: str = Field(description="The subject of the email")
    summary: str = Field(description="A summary of the email content")
    date: str = Field(
        description="The meeting date and time mentioned in the email content"
    )


# Create PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=EmailSummary)
print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": 
### LangChain 输出解析器的使用方法与示例教程 LangChain 提供了丰富的工具支持自然语言处理中的各种任务,其中 **输出解析器 (OutputParser)** 是其核心组件之一。它能够帮助开发者更高效地将大语言模型的自由形式输出转换为结构化数据。 #### 1. 基本概念 输出解析器的核心功能在于将来自语言模型的原始字符串输出转化为可操作的数据结构。这种转化对于后续的任务执行至关重要[^2]。例如,在问答系统中,可能需要从模型返回的结果中提取特定字段;而在文本分类场景下,则需将其映射至预定义类别。 #### 2. 使用方法概述 为了实现上述目标,通常会经历以下几个环节: - 定义期望的目标格式。 - 配置合适的 `OutputParser` 类型。 - 将该类集成到更大的工作流或者链条之中。 下面通过具体例子展示这些步骤的实际应用情况。 #### 3. 实现案例 ##### 示例一:利用 PydanticOutputParser 进行复杂 JSON 数据解析 当面对较为复杂的多维度信息需求时,可以采用基于 Python 的数据验证库——Pydantic 来增强灵活性和准确性。这里给出一段简单的代码片段用于说明这一过程: ```python from pydantic import BaseModel, Field from langchain.output_parsers import PydanticOutputParser class Person(BaseModel): name: str = Field(description="The person's full name.") age: int = Field(description="The person's current age.") parser = PydanticOutputParser(pydantic_object=Person) text_to_parse = "John Doe is a software engineer who just turned thirty years old." result = parser.parse(text_to_parse) print(result.json()) ``` 此脚本首先声明了一个名为 `Person` 的模型,接着初始化相应的解析器实例,并最终演示如何调用 `.parse()` 方法完成整个流程[^1]。 ##### 示例二:基础自定义 OutputParser 构建 如果标准选项无法满足特殊业务逻辑的要求,那么还可以自行设计个性化的解决方案。比如以下是如何捕捉并标准化日期表达式的简单示范: ```python from datetime import datetime from langchain.schema import BaseOutputParser class DateExtractor(BaseOutputParser): def parse(self, text: str) -> dict: try: date_obj = datetime.strptime(text.strip(), "%B %d, %Y") return {"date": date_obj.strftime("%m/%d/%y")} except ValueError as e: raise Exception(f"Invalid input format {e}") custom_parser = DateExtractor() output_text = "March 07, 2023" parsed_data = custom_parser.parse(output_text) print(parsed_data) ``` 在此处我们继承了基类 `BaseOutputParser` 并重写了必要的函数以适应新的规则集[^4]。 #### 4. 总结 以上仅展示了部分可能性,实际上根据项目具体情况可以选择更多样化的策略组合起来解决问题。无论是内置还是扩展版本都体现了强大的适配能力以及高度模块化的特性优势[^3]。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值