使用Guardrails AI生成结构化数据的技术指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00062/article/details/148507984

使用Guardrails AI生成结构化数据的技术指南

guardrails 项目地址: https://gitcode.com/gh_mirrors/gua/guardrails

前言

在现代AI应用开发中，从大型语言模型(LLM)获取结构化输出是一个常见需求。Guardrails AI项目提供了一套优雅的解决方案，能够帮助开发者从各种LLM中可靠地生成结构化数据。本文将深入探讨如何使用Guardrails AI来实现这一目标。

结构化数据生成的基本原理

Guardrails AI提供了两种主要方式来定义和生成结构化数据：

Pydantic模型：利用Python类型系统定义数据结构
RAIL标记语言：使用XML风格的标记语言定义数据结构

这两种方式各有优势，开发者可以根据项目需求和个人偏好进行选择。

使用Pydantic模型生成结构化数据

Pydantic是Python生态中广受欢迎的数据验证库，Guardrails AI与其深度集成，使得结构化数据生成变得异常简单。

基础使用示例

from pydantic import BaseModel

# 定义数据模型
class Person(BaseModel):
    name: str
    age: int
    is_employed: bool

# 创建Guard对象
from guardrails import Guard
guard = Guard.from_pydantic(Person)

# 调用LLM生成数据
import openai
res = guard(
    openai.chat.completion.create,
    model="gpt-3.5-turbo",
)

技术要点解析

模型定义：使用Pydantic的BaseModel定义数据结构，字段类型明确
Guard对象：作为中间层，确保LLM输出符合模型定义
LLM调用：通过guard对象调用LLM，自动处理结构化输出

使用RAIL标记语言生成结构化数据

RAIL(Reliable AI Language)是Guardrails AI特有的标记语言，特别适合在配置文件中定义数据结构。

基础使用示例

<rail version="0.1">
  <output>
    <string name="name" />
    <integer name="age" />
    <boolean name="is_employed" />
  </output>
</rail>

对应的Python代码：

from guardrails import Guard

guard = Guard.from_s("""
  <rail version="0.1">
    <output>
      <string name="name" />
      <integer name="age" />
      <boolean name="is_employed" />
    </output>
  </rail>
""")

RAIL语法特点

强类型系统：明确指定字段类型(string, integer, boolean等)
嵌套结构：支持对象嵌套定义
列表支持：可以定义数组类型输出

复杂数据结构生成实战

嵌套对象生成

实际应用中，我们经常需要处理嵌套的复杂数据结构。以下是一个地址嵌套在人员信息中的示例：

JSON示例输出:

{
  "name": "John Doe",
  "age": 30,
  "is_employed": true,
  "address": {
    "street": "123 Main St",
    "city": "Anytown",
    "zip": "12345"
  }
}

Pydantic实现:

class Address(BaseModel):
    street: str
    city: str
    zip: str

class Person(BaseModel):
    name: str
    age: int
    is_employed: bool
    address: Address

RAIL实现:

<rail version="0.1">
  <output>
    <string name="name" />
    <integer name="age" />
    <boolean name="is_employed" />
    <object name="address">
      <string name="street" />
      <string name="city" />
      <string name="zip" />
    </object>
  </output>
</rail>

列表数据生成

处理多个同类数据项是另一个常见场景：

JSON示例输出:

[
  {
    "name": "John Doe",
    "age": 30,
    "is_employed": true
  },
  {
    "name": "Jane Smith",
    "age": 25,
    "is_employed": false
  }
]

Pydantic实现:

class Person(BaseModel):
    name: str
    age: int
    is_employed: bool

people = list[Person]

RAIL实现:

<rail version="0.1">
  <output type="list">
    <object>
      <string name="name" />
      <integer name="age" />
      <boolean name="is_employed" />
    </object>
  </output>
</rail>