解密LLM结构化输出：代码示例与原理分析

最新推荐文章于 2025-05-04 19:14:31 发布

GZM888888

最新推荐文章于 2025-05-04 19:14:31 发布

阅读量986

点赞数 6

文章标签： java 前端 javascript

本文链接：https://blog.youkuaiyun.com/GZM888888/article/details/144950197

版权

解密LLM结构化输出：代码示例与原理分析

一、LLM结构化输出概述

1. 结构化输出的定义与优势

结构化输出指的是语言模型（LLM）生成的遵循特定格式（如JSON、XML）的数据，这些数据易于解析和处理。相较于非结构化文本，结构化输出在自动化系统中的优势显著，包括易于解析、处理高效、减少错误率等。

二、LLM结构化输出的实现原理

1. 约束解码（Constrained Decoding）

约束解码是实现LLM结构化输出的关键技术之一。该技术通过在每个生成步骤中，基于人工设定的规则确定当前步骤只允许采样的token集合，并通过加bias的方式压制其他不允许采样的token，从而实现指定的结构化数据生成。

2. 格式限制指令（Format Restricting Instructions）

格式限制指令通过在LLM的接口上增加预处理和对输出的retry机制，以确保输出遵循特定的格式。例如，Instructor库通过打猴子补丁，在常规openai的接口上增加response_model的预处理和对输出的retry机制。

3. 结构化生成原理

结构化数据生成的原理可以概括为：在每个生成步骤中，通过人工设定的规则得到当前步骤只允许采样的token集合，然后通过加bias的方式压制其他不允许采样的token，实现指定的结构化数据生成。

三、LLM结构化输出的代码示例

1. 使用LangChain实现结构化数据输出

LangChain是一个提供链接口、与其他工具集成以及用于应用程序的链的库。下面是一个使用LangChain的Output Parsers将模型输出解析成JSON格式的代码示例：

from langchain.llms import OpenAI
from langchain.output_parsers import JsonOutputParser

# 初始化LLM模型
llm = OpenAI()

# 初始化输出解析器
parser = JsonOutputParser()

# 原始模型输出
model_output = llm.generate("请生成一个包含名称和年龄的JSON对象。")

# 解析输出
structured_output = parser.parse(model_output)

print(structured_output)

2. 使用guidance实现结构化输出

guidance库通过“模板语言”定义LLM的输出结构，以确保输出格式的正确性。下面是一个使用guidance库的代码示例：

# load a model locally (we use LLaMA here)
guidance.llm = guidance.llms.Transformers("your_local_path/llama-7b", device=0)

# we can pre-define valid option sets
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

# define the prompt
program = guidance("""The following is a character profile for an RPG game in JSON format.
json
{
    "description": "{{description}}",
    "name": "{{gen 'name'}}",
    "age": {{gen 'age' pattern='[0-9]+' stop=','}},
    "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
    "weapon": "{{select 'weapon' options=valid_weapons}}",
    "class": "{{gen 'class'}}",
    "mantra": "{{gen 'mantra'}}",
    "strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
    "items": [{{#geneach 'items' num_iterations=3}}
        "{{gen 'this'}}",{{/geneach}}
    ]
}""")

# execute the prompt
program(description="A quick and nimble fighter.", valid_weapons=valid_weapons)