LLMCompiler扩展开发：自定义Planner与Executor实现-优快云博客

LLMCompiler扩展开发：自定义Planner与Executor实现

【免费下载链接】LLMCompiler LLMCompiler: An LLM Compiler for Parallel Function Calling 项目地址: https://gitcode.com/gh_mirrors/ll/LLMCompiler

在LLMCompiler框架中，Planner（规划器）和Executor（执行器）是实现任务并行调用的核心组件。本文将详细介绍如何通过扩展这两个模块来定制LLMCompiler的行为，以适应特定业务场景需求。我们将从基础架构入手，逐步讲解自定义实现的关键步骤，并提供完整的代码示例和集成方法。

核心架构概览

LLMCompiler的核心工作流由规划-执行-反思三个阶段构成，其中Planner负责生成并行任务计划，Executor负责调度和运行这些任务。以下是系统的核心模块关系：

mermaid

关键模块的代码路径：

主引擎实现：src/llm_compiler/llm_compiler.py
规划器接口：src/llm_compiler/planner.py
执行器实现：src/executors/agent_executor.py
任务单元管理：src/llm_compiler/task_fetching_unit.py

自定义Planner实现

Planner模块负责将用户查询转换为可并行执行的任务计划。默认实现基于LLM生成任务依赖图，我们可以通过继承和重写关键方法来实现自定义规划逻辑。

基础接口分析

Planner基类的核心方法位于src/llm_compiler/planner.py，主要包括：

class Planner:
    def __init__(self, llm, example_prompt, example_prompt_replan, tools, stop):
        # 初始化规划器参数
        
    async def plan(self, inputs, is_replan, callbacks=None):
        # 生成任务计划的主方法
        
    async def aplan(self, inputs, task_queue, is_replan, callbacks=None):
        # 异步规划方法，支持流式任务生成

其中plan方法返回完整的任务列表，aplan方法则通过异步队列流式输出任务，适合需要实时处理的场景。

自定义规划逻辑实现

以下是一个实现优先级排序的自定义Planner示例，我们将根据任务类型分配执行优先级：

from src.llm_compiler.planner import Planner
from src.llm_compiler.task_fetching_unit import Task

class PriorityPlanner(Planner):
    def __init__(self, *args, priority_map=None, **kwargs):
        super().__init__(*args, **kwargs)
        # 默认优先级映射：搜索工具优先于计算工具
        self.priority_map = priority_map or {
            "SearchTool": 10,
            "CalculatorTool": 5,
            "Default": 3
        }
    
    async def plan(self, inputs, is_replan=False, callbacks=None):
        # 调用父类方法生成原始任务计划
        tasks = await super().plan(inputs, is_replan, callbacks)
        
        # 根据工具类型分配优先级
        for task in tasks.values():
            if not task.is_join:  # 排除Join操作
                tool_name = task.tool.name
                task.priority = self.priority_map.get(tool_name, self.priority_map["Default"])
        
        # 按优先级排序任务（修改任务ID以影响执行顺序）
        sorted_tasks = sorted(tasks.values(), key=lambda x: (-x.priority, x.idx))
        for new_idx, task in enumerate(sorted_tasks, 1):
            task.idx = new_idx
            
        return {task.idx: task for task in sorted_tasks}

集成自定义Planner

修改LLMCompiler初始化代码，替换默认Planner：

# 在初始化LLMCompiler时指定自定义Planner
compiler = LLMCompiler(
    # ...其他参数
    planner=PriorityPlanner(
        llm=planner_llm,
        example_prompt=planner_example_prompt,
        example_prompt_replan=planner_example_prompt_replan,
        tools=tools,
        stop=planner_stop,
        priority_map={"DatabaseTool": 15, "APITool": 10}  # 自定义优先级
    ),
    # ...其他参数
)

自定义Executor实现

Executor负责任务的调度执行和结果收集。LLMCompiler默认提供了基于Agent的执行器，我们可以通过扩展AgentExecutor类来实现自定义的任务执行逻辑。

执行器核心接口

AgentExecutor的核心实现在src/executors/agent_executor.py，关键方法包括：

class AgentExecutor(Chain):
    def _take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager=None):
        # 同步执行单步任务
        
    async def _atake_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager=None):
        # 异步执行单步任务，支持并行调用
        
    def _call(self, inputs, run_manager=None):
        # 同步执行主循环
        
    async def _acall(self, inputs, run_manager=None):
        # 异步执行主循环

其中_atake_next_step方法是实现并行执行的关键，默认使用asyncio.gather并发运行多个工具调用。

实现带超时控制的Executor

以下是一个添加任务超时控制的自定义Executor实现：

from src.executors.agent_executor import AgentExecutor
import asyncio
from typing import List, Tuple, Any

class TimeoutAgentExecutor(AgentExecutor):
    def __init__(self, *args, tool_timeouts=None, **kwargs):
        super().__init__(*args, **kwargs)
        # 工具超时配置：{工具名称: 超时秒数}
        self.tool_timeouts = tool_timeouts or {}
        self.default_timeout = 30  # 默认超时30秒
    
    async def _aperform_agent_action(self, agent_action):
        """重写单个任务执行方法，添加超时控制"""
        tool_name = agent_action.tool
        timeout = self.tool_timeouts.get(tool_name, self.default_timeout)
        
        try:
            # 添加超时控制
            return await asyncio.wait_for(
                super()._aperform_agent_action(agent_action),
                timeout=timeout
            )
        except asyncio.TimeoutError:
            # 超时处理：返回超时信息作为观测结果
            return (
                agent_action, 
                f"[TimeoutError] Tool {tool_name} did not respond within {timeout} seconds"
            )

实现任务重试机制

扩展上述实现，添加失败重试功能：

class RetryTimeoutAgentExecutor(TimeoutAgentExecutor):
    def __init__(self, *args, max_retries=2, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_retries = max_retries  # 最大重试次数
    
    async def _aperform_agent_action(self, agent_action):
        """添加重试逻辑"""
        for attempt in range(self.max_retries + 1):
            try:
                result = await super()._aperform_agent_action(agent_action)
                # 检查是否是超时错误
                if "TimeoutError" in result[1] and attempt < self.max_retries:
                    continue  # 重试
                return result
            except Exception as e:
                if attempt < self.max_retries:
                    continue
                return (agent_action, f"[Error] {str(e)} after {self.max_retries+1} attempts")

集成自定义Executor

修改初始化代码，使用自定义Executor：

# 创建带超时和重试机制的执行器
executor = RetryTimeoutAgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    tool_timeouts={"DatabaseTool": 60, "APITool": 45},
    max_retries=2
)

高级扩展：自定义任务依赖解析

LLMCompiler使用$id语法表示任务依赖（如$1表示依赖ID为1的任务结果）。我们可以扩展解析逻辑以支持更复杂的依赖表达式。

扩展Task类支持条件依赖

修改src/llm_compiler/task_fetching_unit.py中的Task类：

class Task:
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.conditions = []  # 存储条件依赖
        
    def add_condition(self, task_id, condition):
        """添加条件依赖：只有当指定任务满足条件时才执行"""
        self.conditions.append((task_id, condition))
        
    def check_conditions(self, task_results):
        """检查所有条件是否满足"""
        for task_id, condition in self.conditions:
            if task_id not in task_results:
                return False
            if not condition(task_results[task_id]):
                return False
        return True

修改Planner支持条件依赖语法

扩展Planner的任务解析逻辑，支持新的依赖语法（如$1.success==true）：

# 在Planner的任务解析代码中添加条件依赖解析
def parse_conditional_dependencies(args_str):
    """解析条件依赖表达式"""
    conditions = []
    # 匹配 $id.condition 格式的表达式
    pattern = r'\$(\d+)\.(\w+)\s*==\s*(\w+)'
    matches = re.findall(pattern, args_str)
    
    for task_id, field, value in matches:
        # 创建条件函数：检查指定任务的结果是否满足条件
        def condition_func(result, field=field, value=value):
            try:
                result_dict = json.loads(result)
                return result_dict.get(field) == value
            except:
                return False
        conditions.append((int(task_id), condition_func))
        
    return conditions

调试与验证

关键日志与指标

LLMCompiler提供了完善的日志工具，可在src/utils/logger_utils.py中配置详细日志：

# 启用详细日志
log("Custom Planner initialized with priority map:", priority_map, block=True)
log("Task execution order:", [task.idx for task in sorted_tasks], block=True)

对于性能分析，可启用基准测试模式收集指标：

# 初始化时启用基准测试
compiler = LLMCompiler(
    # ...其他参数
    benchmark=True
)

# 执行后获取指标
stats = compiler.get_all_stats()
log("Planner stats:", stats["planner"], block=True)
log("Executor stats:", stats["executor"], block=True)

验证并行执行效果

使用evaluate_results.py脚本验证自定义实现的效果，重点关注：

任务执行顺序是否符合优先级设置
超时和重试机制是否正常工作
条件依赖是否正确解析和执行
整体执行时间是否有改善

总结与最佳实践

自定义Planner和Executor时应遵循以下最佳实践：

接口兼容性：保持与基类接口一致，避免修改核心数据结构
可测试性：为自定义逻辑编写单元测试，重点测试边界情况
性能考量：复杂的规划逻辑可能增加延迟，建议通过缓存或预计算优化
渐进式扩展：先实现最小功能集，通过迭代完善功能

通过本文介绍的方法，你可以灵活扩展LLMCompiler的能力，使其适应更复杂的业务场景。完整的示例代码和更多扩展思路可参考项目的configs/目录下的配置示例和src/llm_compiler/目录中的核心实现。

扩展架构图展示了自定义Planner和Executor在整个系统中的位置，以及与其他组件的交互关系。通过这种分层设计，LLMCompiler能够支持灵活的功能扩展，同时保持核心架构的稳定性。

【免费下载链接】LLMCompiler LLMCompiler: An LLM Compiler for Parallel Function Calling 项目地址: https://gitcode.com/gh_mirrors/ll/LLMCompiler

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考