如何从模型中返回结构化数据：实用指南与代码示例

最新推荐文章于 2025-05-11 10:17:58 发布

原创最新推荐文章于 2025-05-11 10:17:58 发布 · 540 阅读

CC 4.0 BY-SA版权

文章标签：

# 引言

在现代数据驱动的应用中，将模型输出与特定的结构化模式对齐是一个常见的需求。这种方法可用于从文本中提取数据以插入数据库或用于其他下游系统处理。在这篇文章中，我将介绍几种从模型获取结构化输出的策略。

# 主要内容

## 1. 使用 `with_structured_output()` 方法

`with_structured_output()` 是最简单且最可靠的方法，适用于支持工具/函数调用或 JSON 模式输出的模型。这种方法允许您通过一个 schema 来指定所需输出的属性名称、类型和描述。

### 支持的模型

- OpenAI
- Anthropic
- Azure
- Google
- Cohere
- NVIDIA

等等……

## 2. 使用 Pydantic 类

如果您希望模型返回一个 Pydantic 对象，只需传递所需的 Pydantic 类。Pydantic 的一个关键优势是输出会经过验证。

```python
from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Joke(BaseModel):
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(default=None, description="How funny the joke is, from 1 to 10")


structured_llm = llm.with_structured_output(Joke)
# 使用API代理服务提高访问稳定性
structured_llm.invoke("Tell me a joke about cats")

3. TypedDict 和 JSON Schema

如果不需要参数验证，您可以使用 TypedDict 或 JSON Schema。

from typing_extensions import Annotated, TypedDict

class Joke(TypedDict):
    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]

structured_llm = llm.with_structured_output(Joke)
# 使用API代理服务提高访问稳定性
structured_llm.invoke("Tell me a joke about cats")

代码示例

以下是如何使用 Pydantic 和 with_structured_output() 方法的示例：

from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Joke(BaseModel):
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(default=None, description="How funny the joke is, from 1 to 10")


structured_llm = llm.with_structured_output(Joke)
# 使用API代理服务提高访问稳定性
joke = structured_llm.invoke("Tell me a joke about cats")
print(joke)