Tutorial-Codebase-Knowledge项目解析：浏览器自动化中的系统提示机制-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_01022/article/details/148419126

Tutorial-Codebase-Knowledge项目解析：浏览器自动化中的系统提示机制

Tutorial-Codebase-Knowledge Turns Codebase into Easy Tutorial with AI 项目地址: https://gitcode.com/gh_mirrors/tu/Tutorial-Codebase-Knowledge

引言：AI助手的"操作手册"

在现代浏览器自动化系统中，大型语言模型(LLM)扮演着"智能规划师"的角色。然而，要让这个规划师真正理解并执行我们的意图，仅仅给出任务描述是远远不够的。这就像给新员工布置工作时，必须明确说明工作规范和要求一样。在Tutorial-Codebase-Knowledge项目中，System Prompt(系统提示)就是专门用来规范LLM行为的核心机制。

系统提示的本质与作用

系统提示本质上是一组精心设计的指令集，它定义了LLM在浏览器自动化任务中的行为准则。我们可以将其理解为：

角色定义：明确告知LLM它扮演的角色（浏览器自动化助手）
任务边界：规定它能做什么、不能做什么
交互规范：严格定义输入输出的数据格式

在实际项目中，系统提示通过一个Markdown文件(system_prompt.md)来维护，这使得规则既易于阅读又便于版本控制。

系统提示的核心组成部分

1. 角色与目标定义

系统提示首先会明确LLM的角色定位：

# Role Definition
You are an AI agent specialized in browser automation tasks.
Your primary goal is to accomplish the given task through a series of precise browser interactions.

2. 输入规范

系统提示会详细说明LLM将接收到的输入数据格式，特别是关于网页DOM结构的表示方式：

# Input Specification
You will receive:
- Current URL of the webpage
- Structured DOM representation
- Previous action results

3. 动作指令集

系统提示会列出所有可用的浏览器操作指令，例如：

# Available Actions
- click_element: Click on a specified DOM element
- input_text: Enter text into an input field
- navigate: Load a new URL
- scroll: Scroll the page

4. 响应格式规范

这是系统提示最关键的部分，它强制LLM必须以特定JSON格式响应：

{
  "current_state": {
    "evaluation": "...",
    "memory": "...",
    "next_goal": "..."
  },
  "action": [
    {
      "action_type": {
        "parameters": "..."
      }
    }
  ]
}

技术实现细节

在项目代码中，系统提示通过SystemPrompt类进行管理，主要实现以下功能：

提示模板加载：从Markdown文件读取原始提示内容
动态变量替换：处理模板中的占位符(如{{max_actions}})
系统消息生成：将处理后的提示转换为LLM可理解的SystemMessage对象

核心代码逻辑如下：

class SystemPrompt:
    def __init__(self, action_desc, max_actions=10):
        self.action_desc = action_desc
        self.max_actions = max_actions
        self._load_template()
    
    def _load_template(self):
        # 从包内资源加载Markdown文件
        with resources.open_text('browser_use.agent', 'system_prompt.md') as f:
            self.template = f.read()
    
    def get_system_message(self):
        # 替换模板变量并生成系统消息
        prompt = self.template.format(
            max_actions=self.max_actions,
            actions=self.action_desc
        )
        return SystemMessage(content=prompt)

系统提示的工作流程

初始化阶段：
- Agent创建时实例化SystemPrompt
- 加载并处理提示模板
- 生成系统消息
运行阶段：
- MessageManager始终将系统消息作为对话历史的第一条
- 每次LLM调用都会包含系统提示
- LLM响应必须符合提示中定义的格式

graph TD
    A[Agent初始化] --> B[创建SystemPrompt实例]
    B --> C[加载system_prompt.md]
    C --> D[替换模板变量]
    D --> E[生成SystemMessage]
    E --> F[存入MessageManager]
    F --> G[作为LLM对话基础]