前言
该系列教程的代码: https://github.com/shar-pen/Langchain-MiniTutorial
我主要参考 langchain 官方教程, 有选择性的记录了一下学习内容
这是教程清单
- 1.初试langchain
- 2.prompt
- 3.OutputParser/输出解析
- 4.model/vllm模型部署和langchain调用
- 5.DocumentLoader/多种文档加载器
- 6.TextSplitter/文档切分
- 7.Embedding/文本向量化
- 8.VectorStore/向量数据库存储和检索
- 9.Retriever/检索器
- 10.Reranker/文档重排序
- 11.RAG管道/多轮对话RAG
- 12.Agent/工具定义/Agent调用工具/Agentic RAG
PydanticOutputParser
PydanticOutputParser
是一个用于将语言模型的输出转换为结构化信息的类。它能够提供清晰且有组织的格式化信息,而不仅仅是简单的文本响应。
通过使用此类,您可以将语言模型的输出转换为特定的数据模型,使其更易于处理和利用。
主要方法
PydanticOutputParser
主要依赖于两个核心方法:
1. get_format_instructions()
- 提供指令,定义语言模型应输出的数据格式。
- 例如,可以返回一个字符串,其中描述了数据字段及其格式要求。
- 这些指令对于让语言模型结构化输出并符合特定数据模型至关重要。
2. parse()
- 接收语言模型的输出(通常是字符串),并将其解析和转换为特定的数据结构。
- 使用 Pydantic 进行数据验证,将输入字符串与预定义的模式匹配,并转换为符合该模式的数据结构。
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url='http://localhost:5551/v1',
api_key='EMPTY',
model_name='Qwen2.5-7B-Instruct',
temperature=0.2,
)
以下是一个使用 Parser 简化流程的示例
email_conversation = """
From: John (John@bikecorporation.me)
To: Kim (Kim@teddyinternational.me)
Subject: “ZENESIS” bike distribution cooperation and meeting schedule proposal
Dear Mr. Kim,
I am John, Senior Executive Director at Bike Corporation. I recently learned about your new bicycle model, "ZENESIS," through your press release. Bike Corporation is a company that leads innovation and quality in the field of bicycle manufacturing and distribution, with long-time experience and expertise in this field.
We would like to request a detailed brochure for the ZENESIS model. In particular, we need information on technical specifications, battery performance, and design aspects. This information will help us further refine our proposed distribution strategy and marketing plan.
Additionally, to discuss the possibilities for collaboration in more detail, I propose a meeting next Tuesday, January 15th, at 10:00 AM. Would it be possible to meet at your office to have this discussion?
Thank you.
Best regards,
John
Senior Executive Director
Bike Corporation
"""
from itertools import chain
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import AIMessageChunk
from langchain_core.output_parsers import StrOutputParser
prompt = PromptTemplate.from_template(
"Please extract the important parts of the following email.\n\n{email_conversation}"
)
chain = prompt | llm | StrOutputParser()
answer = chain.stream({
"email_conversation": email_conversation})
# A function for real-time output (streaming)
def stream_response(response, return_output=False):
"""
Streams the response from the AI model, processing and printing each chunk.
This function iterates over each item in the 'response' iterable. If an item is an instance of AIMessageChunk, it extracts and prints the content.
If the item is a string, it prints the string directly.
Optionally, the function can return the concatenated string of all response chunks.
Args:
- response (iterable): An iterable of response chunks, which can be AIMessageChunk objects or strings.
- return_output (bool, optional): If True, the function returns the concatenated response string. The default is False.
Returns:
- str: If `return_output` is True, the concatenated response string. Otherwise, nothing is returned.
"""
answer = ""
for token in response:
if isinstance(token, AIMessageChunk):
answer += token.content
print(token.content, end="", flush=True)
elif isinstance(token, str):
answer += token
print(token, end="", flush=True)
if return_output:
return answer
output = stream_response(answer, return_output=True)
### Important Parts of the Email:
- **From:** John (John@bikecorporation.me)
- **To:** Kim (Kim@teddyinternational.me)
- **Subject:** "ZENESIS" bike distribution cooperation and meeting schedule proposal
- **Key Points:**
- John is the Senior Executive Director at Bike Corporation.
- He learned about the "ZENESIS" bicycle model through a press release.
- Bike Corporation is a leading company in bicycle manufacturing and distribution.
- They are requesting a detailed brochure for the ZENESIS model, specifically needing information on technical specifications, battery performance, and design aspects.
- A meeting is proposed for Tuesday, January 15th, at 10:00 AM at Kim's office to discuss collaboration possibilities in more detail.
- **Proposed Meeting:**
- Date: Tuesday, January 15th
- Time: 10:00 AM
- Location: Kim's office
- **Purpose:**
- To discuss the possibilities for collaboration and further refine the distribution strategy and marketing plan for the ZENESIS model.
当不使用 output parser(PydanticOutputParser) 时,需要对数据类型和访问方式自定义
answer = chain.invoke({
"email_conversation": email_conversation})
print(answer)
### Important Parts of the Email:
- **From:** John (John@bikecorporation.me)
- **To:** Kim (Kim@teddyinternational.me)
- **Subject:** "ZENESIS" bike distribution cooperation and meeting schedule proposal
- **Key Points:**
- John is the Senior Executive Director at Bike Corporation.
- He learned about the "ZENESIS" bicycle model through a press release.
- Bike Corporation is a leading company in bicycle manufacturing and distribution.
- They are requesting a detailed brochure for the ZENESIS model, specifically needing information on technical specifications, battery performance, and design aspects.
- A meeting is proposed for Tuesday, January 15th, at 10:00 AM at Kim's office to discuss collaboration possibilities in more detail.
- **Proposed Meeting:**
- Date: Tuesday, January 15th
- Time: 10:00 AM
- Location: Kim's office
- **Follow-Up:**
- John requests a detailed brochure for the ZENESIS model.
- He is interested in discussing potential distribution and marketing strategies.
使用 PydanticOutputParser
当提供类似上述的电子邮件内容时,我们将使用以下以 Pydantic 风格定义的类来解析邮件信息。
作为参考, Field 内的 description 用于指导从文本响应中提取关键信息。LLM 依赖此描述来提取所需信息。因此,确保该描述准确且清晰至关重要。
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser
class EmailSummary(BaseModel):
person: str = Field(description="The sender of the email")
email: str = Field(description="The email address of the sender")
subject: str = Field(description="The subject of the email")
summary: str = Field(description="A summary of the email content")
date: str = Field(
description="The meeting date and time mentioned in the email content"
)
# Create PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=EmailSummary)
print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type":