使用引用示例进行数据提取的技巧-优快云博客

使用引用示例进行数据提取的技巧

在文本和其他非结构化或半结构化形式中提取信息以生成结构化表示时，通常可以通过向大型语言模型提供引用示例来提高提取质量。在这种背景下，工具调用的LLM功能经常被使用。本指南演示了如何构建工具调用的少量示例，以帮助指导提取和类似应用的行为。

技术背景介绍

引用示例能够帮助模型更准确地理解和提取指定的信息，尤其是在工具调用模型中应用时。虽然我们主要讲解如何在工具调用模型中使用这些示例，但这项技术也适用于JSON或基于提示的其他技术。

LangChain在LLM的消息中实现了一个工具调用属性，以包括工具调用。请参阅我们的工具调用指南以获取更多详细信息。

核心原理解析

通过构建一个包含以下消息序列的聊天历史来准备数据提取的引用示例：

HumanMessage: 包含用于提取的示例输入。
AIMessage: 包含示例工具调用。
ToolMessage: 包含示例工具输出。

LangChain采用这种约定来将工具调用结构化为不同LLM模型之间的对话。

首先，我们构建一个提示模板，包含这些消息的占位符：

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert extraction algorithm. Only extract relevant information from the text. If you do not know the value of an attribute asked to extract, return null for the attribute's value."),
        MessagesPlaceholder("examples"),  # <-- 引用示例的占位符
        ("human", "{text}"),
    ]
)

代码实现演示(重点)

以Person为例，我们定义数据提取的结构：

from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Person(BaseModel):
    """Information about a person."""
    name: Optional[str] = Field(..., description="The name of the person")
    hair_color: Optional[str] = Field(..., description="The color of the person's hair if known")
    height_in_meters: Optional[str] = Field(..., description="Height in METERs")

class Data(BaseModel):
    """Extracted data about people."""
    people: List[Person]

定义引用示例：

examples = [
    (
        "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
        Data(people=[]),
    ),
    (
        "Fiona traveled far from France to Spain.",
        Data(people=[Person(name="Fiona", height_in_meters=None, hair_color=None)]),
    ),
]

将示例转换为消息格式：

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    ToolMessage,
)

messages = []

for text, tool_call in examples:
    messages.extend(
        tool_example_to_messages({"input": text, "tool_calls": [tool_call]})
    )