22、智能体开发全解析：从基础到高级应用-优快云博客

本文链接：https://blog.youkuaiyun.com/php55/article/details/152241041

智能体开发全解析：从基础到高级应用

1. 智能体输入与输出

智能体的输入基于键值对，唯一必需的键是 intermediate_steps ，它对应着前面讨论的中间步骤，这些步骤为智能体提供了到目前为止已完成操作的上下文。 PromptTemplate 负责将这些键值对转换为语言模型易于理解的格式。

智能体的输出可以是下一步要执行的操作，或者是返回给用户的最终响应，在技术层面由 AgentActions 或 AgentFinish 表示。输出有三种类型：
- AgentAction ：智能体下一步要执行的单个操作。
- List[AgentAction] ：智能体下一步要执行的操作列表。
- AgentFinish ：智能体要返回给用户的最终响应。

输出解析器负责将语言模型的原始输出转换为上述三种类型之一。

以下是一个代码示例，展示了智能体输入和输出的使用：

from langchain_core.agents import AgentAction, AgentFinish
# 创建智能体
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
# 定义智能体的输入
agent_input = {
    "intermediate_steps": [
        (AgentAction(tool="Search", tool_input="What is the capital of France?", log="Searching for the capital of France"), "Paris is the capital of France.")
    ]
}
# 创建智能体执行器
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# 使用输入调用智能体
agent_output = agent.run(agent_input)
# 检查智能体输出的类型
if isinstance(agent_output, AgentAction):
    print("Agent wants to take the following action:", agent_output)
elif isinstance(agent_output, list):
    print("Agent wants to take the following actions:", agent_output)
elif isinstance(agent_output, AgentFinish):
    print("Agent has finished with the following response:", agent_output)

在这个示例中，我们通过提供 intermediate_steps 来定义智能体的输入，然后调用智能体并将输出存储在 agent_output 中。最后，使用 isinstance() 检查输出类型，根据类型采取相应的操作或返回最终响应给用户。

2. 智能体执行器

智能体执行器（ AgentExecutor ）是幕后的核心引擎，为智能体提供平稳高效运行的环境。它负责调用智能体、执行其选择的操作、将操作输出返回给智能体，并重复这个过程直到智能体得出结论。

以下是智能体执行器工作原理的简化伪代码：

next_action = agent.get_action(...)
while next_action != AgentFinish:
    observation = run(next_action)
    next_action = agent.get_action(..., next_action, observation)
return next_action

智能体执行器还处理了一些复杂情况：
1. 当智能体选择不存在的工具时，执行器会优雅地处理并使智能体回到正轨。
2. 如果工具在执行过程中遇到错误，执行器会捕获异常并妥善处理，确保智能体继续工作。
3. 当智能体产生无法解析为有效工具调用的输出时，执行器会处理这种情况并引导智能体回到有效路径。
4. 执行器提供全面的日志记录和可观测性，可将信息输出到标准输出或发送到 LangSmith 进行进一步分析和可视化。

3. 工具与工具包

3.1 工具

工具是智能体、链或大语言模型（LLM）与外界交互的接口，包含以下关键元素：
1. 工具名称 ：简洁描述工具功能的标签。
2. 描述：简要说明工具的用途和功能。
3. JSON 模式 ：工具所需输入的结构化定义。
4. 调用函数 ：执行工具操作的实际代码。
5. 标志：确定工具输出是否应立即显示或进一步处理。

在 LangChain 中，工具抽象由两个关键组件组成：
1. 工具输入模式 ：告诉语言模型调用工具所需的参数。
2. 运行函数 ：工具被调用时执行的 Python 函数。

建议使用具有单个字符串输入的工具，因为智能体使用起来更方便。

以下是使用 WikipediaQueryRun 工具的示例：

!pip install langchain==0.2.5 langchain_openai==0.2.5 wikipedia
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
# 初始化工具
api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100)
tool = WikipediaQueryRun(api_wrapper=api_wrapper)
# 检查工具属性
print(tool.name)  # 输出: 'Wikipedia'
print(tool.description)  # 输出: 'A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.'
print(tool.args)  # 输出: {'query': {'title': 'Query', 'type': 'string'}}
print(tool.return_direct)  # 输出: False
# 搜索 LangChain 信息
print(tool.run({"query": "langchain"}))
print(tool.run("langchain"))

我们还可以自定义工具的名称、描述和 JSON 模式：

from langchain_core.pydantic_v1 import BaseModel, Field
class WikiInputs(BaseModel):
    """Inputs to the wikipedia tool."""
    query: str = Field(description="query to look up in Wikipedia, should be 3 or less words")
tool = WikipediaQueryRun(
    name="wiki-tool",
    description="look up things in wikipedia",
    args_schema=WikiInputs,
    api_wrapper=api_wrapper,
    return_direct=True,
)
print(tool.name)  # 输出: 'wiki-tool'
print(tool.description)  # 输出: 'look up things in wikipedia'
print(tool.args)  # 输出: {'query': {'title': 'Query', 'description': 'query to look up in Wikipedia, should be 3 or less words', 'type': 'string'}}
print(tool.return_direct)  # 输出: True
print(tool.run("langchain"))

3.2 工具包

工具包是为特定任务精心策划的工具集合，它们带有方便的加载方法，便于快速使用所需工具。例如，GitHub 工具包包含搜索 GitHub 问题、读取文件、评论问题等工具。

使用工具包的步骤如下：
1. 初始化工具包：

toolkit = ExampleToolkit(...)

获取工具包中的工具：

tools = toolkit.get_tools()

创建智能体：

agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)

4. 设计考虑因素

在使用工具时，有两个重要的设计考虑因素：
1. 为智能体提供合适的工具 ：确保智能体具备完成目标所需的工具，否则其能力将受到限制。
2. 以对智能体最有帮助的方式描述工具 ：清晰描述工具的用途，以便智能体做出明智的决策。

5. 使用 LangGraph 构建智能体

5.1 什么是 LangGraph

LangGraph 允许以图的形式组织信息，节点代表数据或任务，边代表它们之间的关系。这种结构使智能体更容易处理复杂的工作流程、理解上下文并高效执行多步骤任务。

5.2 设置 LangGraph

安装 LangChain：

!pip install langchain==0.2.5 langchain_openai==0.1.8 langgraph==0.1.8

导入必要的模块：

from langchain_openai import OpenAI
from langchain.agents import Agent
from langchain.graph import LangGraph, Node

初始化语言模型：

llm = OpenAI(api_key="your_openai_api_key")

5.3 创建简单的 LangGraph

假设我们要构建一个旅行智能体，帮助用户规划旅行：

# 定义节点
node1 = Node(name="GreetUser", action=lambda: "Hello! How can I assist you with your travel plans today?")
node2 = Node(name="GetDestination", action=lambda user_input: f"Great choice! {user_input} sounds like a fantastic destination.")
node3 = Node(name="SuggestActivities", action=lambda: "Here are some activities you might enjoy: visiting museums, trying local cuisine, and exploring nature trails.")
# 创建 LangGraph
travel_graph = LangGraph()
travel_graph.add_node(node1)
travel_graph.add_node(node2)
travel_graph.add_node(node3)
# 定义边
travel_graph.add_edge("GreetUser", "GetDestination")
travel_graph.add_edge("GetDestination", "SuggestActivities")
# 定义智能体
class TravelAgent(Agent):
    def __init__(self, llm, graph):
        super().__init__(llm=llm)
        self.graph = graph
    def run(self, user_input):
        current_node = self.graph.get_node("GreetUser")
        response = current_node.action()
        print(response)
        current_node = self.graph.get_node("GetDestination")
        response = current_node.action(user_input)
        print(response)
        current_node = self.graph.get_node("SuggestActivities")
        response = current_node.action()
        print(response)
# 实例化智能体
agent = TravelAgent(llm=llm, graph=travel_graph)
# 测试智能体
agent.run("Hawaii")

6. 智能体类型

6.1 选择智能体类型的标准

选择智能体类型时，需要考虑以下因素：
| 考虑因素 | 说明 |
| ---- | ---- |
| 预期模型类型 | 分为聊天模型和语言模型（LLM），选择与模型类型匹配的智能体可能获得更好的结果。 |
| 聊天历史支持 | 支持聊天历史的智能体适用于聊天机器人等应用，不支持的适用于单任务或一次性交互。 |
| 多输入工具支持 | 处理需要多个输入的复杂工具时，选择支持多输入工具的智能体。 |
| 并行函数调用 | 并行函数调用可加速某些任务的执行，但并非所有智能体类型都支持。 |
| 模型参数 | 某些智能体类型对模型参数有特定要求。 |

6.2 LangChain 智能体类型

智能体类型	预期模型类型	支持聊天历史	支持多输入工具	支持并行函数调用	所需模型参数	使用场景
API Tool Calling	聊天	✅	✅	✅	tools	使用工具调用模型时
OpenAI Tools	聊天	✅	✅	✅	tools	使用近期 OpenAI 模型（1106 及以后），建议使用通用工具调用智能体
OpenAI Functions	聊天	✅	✅	✅	functions	使用经过微调以支持函数调用的 OpenAI 模型或开源模型，建议使用通用工具调用智能体
XML	LLM	✅	✅	✅	无	使用 Anthropic 模型或擅长处理 XML 的模型时
Structured Chat	聊天	✅	✅	❌	无	需要支持多输入工具时
JSON Chat	聊天	✅	❌	❌	无	使用擅长处理 JSON 的模型时
ReAct	LLM	❌	❌	❌	无	使用简单模型时
Self - Ask with Search	LLM	❌	❌	❌	无	使用简单模型且只有一个搜索工具时

此外，还有两种常见的智能体：
- Zero - Shot - React Agent ：无需事先训练或微调即可处理各种任务，根据给定的工具和语言模型动态生成响应。
- Conversation Agent ：用于对话式 AI 应用，如聊天机器人或虚拟助手，可进行来回对话、维护上下文并根据对话历史提供相关响应。

通过以上内容，我们全面了解了智能体开发的各个方面，从基础的输入输出到高级的图结构应用，以及不同类型智能体的选择和使用，希望能帮助你在实际开发中构建出更强大、更智能的应用。

智能体开发全解析：从基础到高级应用

7. 智能体开发流程总结

为了更清晰地展示智能体开发的整体流程，下面通过 mermaid 格式的流程图进行呈现：

graph LR
    classDef startend fill:#F5EBFF,stroke:#BE8FED,stroke-width:2px
    classDef process fill:#E5F6FF,stroke:#73A6FF,stroke-width:2px
    classDef decision fill:#FFF6CC,stroke:#FFBC52,stroke-width:2px

    A([开始]):::startend --> B(确定需求):::process
    B --> C{选择智能体类型}:::decision
    C -->|根据模型类型、功能需求等| D(选择合适的工具和工具包):::process
    D --> E(设置智能体输入):::process
    E --> F(创建智能体执行器):::process
    F --> G(运行智能体):::process
    G --> H{是否得出结论}:::decision
    H -->|否| I(处理异常情况):::process
    I --> F
    H -->|是| J(输出最终响应):::process
    J --> K([结束]):::startend

这个流程图展示了智能体开发的主要步骤，从确定需求开始，经过选择智能体类型、工具和工具包，设置输入，创建执行器，运行智能体，处理可能出现的异常情况，直到最终得出结论并输出响应。

8. 操作步骤汇总

为了方便大家在实际开发中参考，下面将前面提到的关键操作步骤进行汇总：

8.1 智能体输入输出操作

from langchain_core.agents import AgentAction, AgentFinish
# 创建智能体
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
# 定义智能体的输入
agent_input = {
    "intermediate_steps": [
        (AgentAction(tool="Search", tool_input="What is the capital of France?", log="Searching for the capital of France"), "Paris is the capital of France.")
    ]
}
# 创建智能体执行器
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# 使用输入调用智能体
agent_output = agent.run(agent_input)
# 检查智能体输出的类型
if isinstance(agent_output, AgentAction):
    print("Agent wants to take the following action:", agent_output)
elif isinstance(agent_output, list):
    print("Agent wants to take the following actions:", agent_output)
elif isinstance(agent_output, AgentFinish):
    print("Agent has finished with the following response:", agent_output)

8.2 工具使用操作

!pip install langchain==0.2.5 langchain_openai==0.2.5 wikipedia
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
# 初始化工具
api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100)
tool = WikipediaQueryRun(api_wrapper=api_wrapper)
# 检查工具属性
print(tool.name)
print(tool.description)
print(tool.args)
print(tool.return_direct)
# 搜索信息
print(tool.run({"query": "langchain"}))
print(tool.run("langchain"))
# 自定义工具
from langchain_core.pydantic_v1 import BaseModel, Field
class WikiInputs(BaseModel):
    """Inputs to the wikipedia tool."""
    query: str = Field(description="query to look up in Wikipedia, should be 3 or less words")
tool = WikipediaQueryRun(
    name="wiki-tool",
    description="look up things in wikipedia",
    args_schema=WikiInputs,
    api_wrapper=api_wrapper,
    return_direct=True,
)
print(tool.name)
print(tool.description)
print(tool.args)
print(tool.return_direct)
print(tool.run("langchain"))

8.3 工具包使用操作

# 初始化工具包
toolkit = ExampleToolkit(...)
# 获取工具包中的工具
tools = toolkit.get_tools()
# 创建智能体
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)

8.4 LangGraph 使用操作

# 安装依赖
!pip install langchain==0.2.5 langchain_openai==0.1.8 langgraph==0.1.8
# 导入模块
from langchain_openai import OpenAI
from langchain.agents import Agent
from langchain.graph import LangGraph, Node
# 初始化语言模型
llm = OpenAI(api_key="your_openai_api_key")
# 定义节点
node1 = Node(name="GreetUser", action=lambda: "Hello! How can I assist you with your travel plans today?")
node2 = Node(name="GetDestination", action=lambda user_input: f"Great choice! {user_input} sounds like a fantastic destination.")
node3 = Node(name="SuggestActivities", action=lambda: "Here are some activities you might enjoy: visiting museums, trying local cuisine, and exploring nature trails.")
# 创建 LangGraph
travel_graph = LangGraph()
travel_graph.add_node(node1)
travel_graph.add_node(node2)
travel_graph.add_node(node3)
# 定义边
travel_graph.add_edge("GreetUser", "GetDestination")
travel_graph.add_edge("GetDestination", "SuggestActivities")
# 定义智能体
class TravelAgent(Agent):
    def __init__(self, llm, graph):
        super().__init__(llm=llm)
        self.graph = graph
    def run(self, user_input):
        current_node = self.graph.get_node("GreetUser")
        response = current_node.action()
        print(response)
        current_node = self.graph.get_node("GetDestination")
        response = current_node.action(user_input)
        print(response)
        current_node = self.graph.get_node("SuggestActivities")
        response = current_node.action()
        print(response)
# 实例化智能体
agent = TravelAgent(llm=llm, graph=travel_graph)
# 测试智能体
agent.run("Hawaii")

9. 智能体开发的注意事项

在智能体开发过程中，还有一些额外的注意事项需要牢记：
1. 工具输入的简单性 ：尽量使用具有单个字符串输入的工具，这样智能体使用起来更加方便。如果需要使用复杂输入的工具，要确保选择支持多输入工具的智能体类型。
2. 异常处理 ：智能体执行过程中可能会遇到各种异常情况，如选择不存在的工具、工具执行出错等，智能体执行器会处理这些情况，但在开发过程中也要对可能出现的异常有清晰的认识。
3. 模型参数 ：不同的智能体类型对模型参数有不同的要求，在选择智能体类型时，要确保提供相应的模型参数，否则可能会影响智能体的正常运行。
4. 日志记录和可观测性 ：利用智能体执行器提供的日志记录和可观测性功能，将信息输出到标准输出或发送到 LangSmith 进行进一步分析和可视化，有助于调试和优化智能体。

10. 总结与展望

通过对智能体开发各个方面的详细介绍，我们从智能体的基础输入输出开始，逐步深入到工具和工具包的使用、图结构的应用，以及不同类型智能体的选择和使用。在实际开发中，我们可以根据具体的需求和场景，灵活选择合适的智能体类型、工具和工具包，利用 LangGraph 等高级技术来构建更复杂、更强大的智能体。

未来，随着人工智能技术的不断发展，智能体的应用场景将会更加广泛，功能也会更加丰富。例如，在智能客服、智能助手、自动化流程等领域，智能体将发挥越来越重要的作用。同时，智能体与其他技术的融合也将成为一个重要的发展方向，如与物联网、大数据等技术的结合，将为我们带来更多的创新应用。希望大家能够掌握这些知识，在智能体开发的道路上不断探索和创新。