【小白教程】从零开始学 Dify - 万字详解 Dify 循环和迭代的实现机制,建议收藏!!!

一、概述

Dify 是一个强大的 AI 应用开发平台,其工作流引擎支持复杂的循环和迭代操作。接下来将深入分析 Dify 中循环和迭代的实现机制。

什么是循环和迭代?

  • 循环(Loop):根据条件重复执行一组操作,直到满足退出条件
  • 迭代(Iteration):对集合中的每个元素执行相同的操作

为什么需要循环和迭代?

在 AI 应用中,循环和迭代机制能够:

  • 处理批量数据
  • 实现复杂的业务逻辑
  • 提高工作流的灵活性和可重用性
  • 支持动态数据处理

二、核心概念

节点类型

Dify 工作流引擎定义了以下与循环和迭代相关的节点类型:

class NodeType(StrEnum):
    LOOP = "loop"                    # 循环节点
    LOOP_START = "loop-start"        # 循环开始节点
    LOOP_END = "loop-end"            # 循环结束节点
    ITERATION = "iteration"          # 迭代节点
    ITERATION_START = "iteration-start"  # 迭代开始节点

核心组件架构

图片

三、执行流程

整体执行架构

图片

节点执行详细流程

图片

四、循环机制详解

循环节点结构

循环节点(LoopNode)是 Dify 中实现循环逻辑的核心组件。它包含以下关键属性:

class LoopNodeData(BaseLoopNodeData):
    loop_count: int                               # 循环次数
    loop_variables: list[LoopVariable]            # 循环变量
    break_conditions: list[Condition]             # 中断条件
    logical_operator: Literal["and", "or"]        # 逻辑运算符

循环执行流程

图片

循环实现核心代码分析

1. 循环主执行方法
api/core/workflow/nodes/loop/loop_node.py
def _run(self) -> Generator[NodeEvent | InNodeEvent, None, None]:
    loop_count = self.node_data.loop_count  
 break_conditions = self.node_data.break_conditions  
 logical_operator = self.node_data.logical_operator  
   
 inputs = {"loop_count": loop_count}  
   
ifnot self.node_data.start_node_id:  
     raise ValueError(f"field start_node_id in loop {self.node_id} not found")  
   
# Initialize graph  
 loop_graph = Graph.init(graph_config=self.graph_config, root_node_id=self.node_data.start_node_id)  
ifnot loop_graph:  
     raise ValueError("loop graph not found")  
   
# Initialize variable pool  
 variable_pool = self.graph_runtime_state.variable_pool  
 variable_pool.add([self.node_id, "index"], 0)  
   
# Initialize loop variables  
 loop_variable_selectors = {}  
if self.node_data.loop_variables:  
     for loop_variable in self.node_data.loop_variables:  
         value_processor = {  
             "constant": lambda var=loop_variable: self._get_segment_for_constant(var.var_type, var.value),  
             "variable": lambda var=loop_variable: variable_pool.get(var.value),  
         }  
   
         if loop_variable.value_type notin value_processor:  
             raise ValueError(  
                 f"Invalid value type '{loop_variable.value_type}' for loop variable {loop_variable.label}"
             )  
   
         processed_segment = value_processor[loop_variable.value_type]()  
         ifnot processed_segment:  
             raise ValueError(f"Invalid value for loop variable {loop_variable.label}")  
         variable_selector = [self.node_id, loop_variable.label]  
         variable_pool.add(variable_selector, processed_segment.value)  
         loop_variable_selectors[loop_variable.label] = variable_selector  
         inputs[loop_variable.label] = processed_segment.value  
   
from core.workflow.graph_engine.graph_engine import GraphEngine  
   
 graph_engine = GraphEngine(  
     tenant_id=self.tenant_id,  
     app_id=self.app_id,  
     workflow_type=self.workflow_type,  
     workflow_id=self.workflow_id,  
     user_id=self.user_id,  
     user_from=self.user_from,  
     invoke_from=self.invoke_from,  
     call_depth=self.workflow_call_depth,  
     graph=loop_graph,  
     graph_config=self.graph_config,  
     variable_pool=variable_pool,  
     max_execution_steps=dify_config.WORKFLOW_MAX_EXECUTION_STEPS,  
     max_execution_time=dify_config.WORKFLOW_MAX_EXECUTION_TIME,  
     thread_pool_id=self.thread_pool_id,  
 )  
   
 start_at = datetime.now(UTC).replace(tzinfo=None)  
 condition_processor = ConditionProcessor()  
   
# Start Loop event  
yield LoopRunStartedEvent(  
     loop_id=self.id,  
     loop_node_id=self.node_id,  
     loop_node_type=self.node_type,  
     loop_node_data=self.node_data,  
     start_at=start_at,  
     inputs=inputs,  
     metadata={"loop_length": loop_count},  
     predecessor_node_id=self.previous_node_id,  
 )  

 loop_duration_map = {}  
 single_loop_variable_map = {}  # single loop variable output  
try:  
     check_break_result = False
     for i in range(loop_count):  
         loop_start_time = datetime.now(UTC).replace(tzinfo=None)  
         # run single loop  
         loop_result = yieldfrom self._run_single_loop(  
             graph_engine=graph_engine,  
             loop_graph=loop_graph,  
             variable_pool=variable_pool,  
             loop_variable_selectors=loop_variable_selectors,  
             break_conditions=break_conditions,  
             logical_operator=logical_operator,  
             condition_processor=condition_processor,  
             current_index=i,  
             start_at=start_at,  
             inputs=inputs,  
         )  
         loop_end_time = datetime.now(UTC).replace(tzinfo=None)  
   
         single_loop_variable = {}  
         for key, selector in loop_variable_selectors.items():  
             item = variable_pool.get(selector)  
             if item:  
                 single_loop_variable[key] = item.value  
             else:  
                 single_loop_variable[key] = None
   
         loop_duration_map[str(i)] = (loop_end_time - loop_start_time).total_seconds()  
         single_loop_variable_map[str(i)] = single_loop_variable  
   
         check_break_result = loop_result.get("check_break_result", False)  
   
         if check_break_result:  
             break
   
     # Loop completed successfully  
     yield LoopRunSucceededEvent(  
         loop_id=self.id,  
         loop_node_id=self.node_id,  
         loop_node_type=self.node_type,  
         loop_node_data=self.node_data,  
         start_at=start_at,  
         inputs=inputs,  
         outputs=self.node_data.outputs,  
         steps=loop_count,  
         metadata={  
             WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: graph_engine.graph_runtime_state.total_tokens,  
             "completed_reason": "loop_break"if check_break_result else"loop_completed",  
             WorkflowNodeExecutionMetadataKey.LOOP_DURATION_MAP: loop_duration_map,  
             WorkflowNodeExecutionMetadataKey.LOOP_VARIABLE_MAP: single_loop_variable_map,  
         },  
     )  
   
     yield RunCompletedEvent(  
         run_result=NodeRunResult(  
             status=WorkflowNodeExecutionStatus.SUCCEEDED,  
             metadata={  
                 WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: graph_engine.graph_runtime_state.total_tokens,  
                 WorkflowNodeExecutionMetadataKey.LOOP_DURATION_MAP: loop_duration_map,  
                 WorkflowNodeExecutionMetadataKey.LOOP_VARIABLE_MAP: single_loop_variable_map,  
             },  
             outputs=self.node_data.outputs,  
             inputs=inputs,  
         )  
     )  
   
except Exception as e:  
     # Loop failed  
     logger.exception("Loop run failed")  
     yield LoopRunFailedEvent(  
         loop_id=self.id,  
         loop_node_id=self.node_id,  
         loop_node_type=self.node_type,  
         loop_node_data=self.node_data,  
         start_at=start_at,  
         inputs=inputs,  
         steps=loop_count,  
         metadata={  
             "total_tokens": graph_engine.graph_runtime_state.total_tokens,  
             "completed_reason": "error",  
             WorkflowNodeExecutionMetadataKey.LOOP_DURATION_MAP: loop_duration_map,  
             WorkflowNodeExecutionMetadataKey.LOOP_VARIABLE_MAP: single_loop_variable_map,  
         },  
         error=str(e),  
     )  
   
     yield RunCompletedEvent(  
         run_result=NodeRunResult(  
             status=WorkflowNodeExecutionStatus.FAILED,  
             error=str(e),  
             metadata={  
                 WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: graph_engine.graph_runtime_state.total_tokens,  
                 WorkflowNodeExecutionMetadataKey.LOOP_DURATION_MAP: loop_duration_map,  
                 WorkflowNodeExecutionMetadataKey.LOOP_VARIABLE_MAP: single_loop_variable_map,  
             },  
         )  
     )  
   
finally:  
     # Clean up  
     variable_pool.remove([self.node_id, "index"])
2. 单次循环执行
def _run_single_loop(
    self,
    *,
    graph_engine: "GraphEngine",
    loop_graph: Graph,
    variable_pool: "VariablePool",
    loop_variable_selectors: dict,
    break_conditions: list,
    logical_operator: Literal["and", "or"],
    condition_processor: ConditionProcessor,
    current_index: int,
    start_at: datetime,
    inputs: dict,
) -> dict:
    # 更新循环索引
    variable_pool.add([self.node_id, "index"], current_index)
    
    # 执行循环图
    for event in graph_engine.run():
        if isinstance(event, GraphRunSucceededEvent):
            # 检查中断条件
            check_break_result = False
            if break_conditions:
                check_break_result = condition_processor.process_conditions(
                    conditions=break_conditions,
                    logical_operator=logical_operator,
                    variable_pool=variable_pool,
                )
            
            if check_break_result:
                return {"check_break_result": True}
            
            # 触发下一次循环事件
            yield LoopRunNextEvent(
                loop_id=self.id,
                loop_node_id=self.node_id,
                loop_node_type=self.node_type,
                loop_node_data=self.node_data,
                index=current_index + 1,
                pre_loop_output=None,
            )
            
            return {"check_break_result": False}
            
        elif isinstance(event, GraphRunFailedEvent):
            yield LoopRunFailedEvent(
                loop_id=self.id,
                loop_node_id=self.node_id,
                loop_node_type=self.node_type,
                loop_node_data=self.node_data,
                start_at=start_at,
                inputs=inputs,
                error=event.error,
            )
            return {"check_break_result": False}](<self,
        *,
        graph_engine: "GraphEngine",
        loop_graph: Graph,
        variable_pool: "VariablePool",
        loop_variable_selectors: dict,
        break_conditions: list,
        logical_operator: Literal["and", "or"],
        condition_processor: ConditionProcessor,
        current_index: int,
        start_at: datetime,
        inputs: dict,
    ) -%3E Generator[NodeEvent | InNodeEvent, None, dict]:
        """Run a single loop iteration.
        Returns:
            dict:  {'check_break_result': bool}
        """
        # Run workflow
        rst = graph_engine.run()
        current_index_variable = variable_pool.get([self.node_id, "index"])
        ifnot isinstance(current_index_variable, IntegerSegment):
            raise ValueError(f"loop {self.node_id} current index not found")
        current_index = current_index_variable.value

        check_break_result = False

        for event in rst:
            if isinstance(event, (BaseNodeEvent | BaseParallelBranchEvent)) andnot event.in_loop_id:
                event.in_loop_id = self.node_id

            if (
                isinstance(event, BaseNodeEvent)
                and event.node_type == NodeType.LOOP_START
                andnot isinstance(event, NodeRunStreamChunkEvent)
            ):
                continue

            if (
                isinstance(event, NodeRunSucceededEvent)
                and event.node_type == NodeType.LOOP_END
                andnot isinstance(event, NodeRunStreamChunkEvent)
            ):
                check_break_result = True
                yield self._handle_event_metadata(event=event, iter_run_index=current_index)
                break

            if isinstance(event, NodeRunSucceededEvent):
                yield self._handle_event_metadata(event=event, iter_run_index=current_index)

                # Check if all variables in break conditions exist
                exists_variable = False
                for condition in break_conditions:
                    ifnot self.graph_runtime_state.variable_pool.get(condition.variable_selector):
                        exists_variable = False
                        break
                    else:
                        exists_variable = True
                if exists_variable:
                    input_conditions, group_result, check_break_result = condition_processor.process_conditions(
                        variable_pool=self.graph_runtime_state.variable_pool,
                        conditions=break_conditions,
                        operator=logical_operator,
                    )
                    if check_break_result:
                        break

            elif isinstance(event, BaseGraphEvent):
                if isinstance(event, GraphRunFailedEvent):
                    # Loop run failed
                    yield LoopRunFailedEvent(
                        loop_id=self.id,
                        loop_node_id=self.node_id,
                        loop_node_type=self.node_type,
                        loop_node_data=self.node_data,
                        start_at=start_at,
                        inputs=inputs,
                        steps=current_index,
                        metadata={
                            WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: (
                                graph_engine.graph_runtime_state.total_tokens
                            ),
                            "completed_reason": "error",
                        },
                        error=event.error,
                    )
                    yield RunCompletedEvent(
                        run_result=NodeRunResult(
                            status=WorkflowNodeExecutionStatus.FAILED,
                            error=event.error,
                            metadata={
                                WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: (
                                    graph_engine.graph_runtime_state.total_tokens
                                )
                            },
                        )
                    )
                    return {"check_break_result": True}
            elif isinstance(event, NodeRunFailedEvent):
                # Loop run failed
                yield self._handle_event_metadata(event=event, iter_run_index=current_index)
                yield LoopRunFailedEvent(
                    loop_id=self.id,
                    loop_node_id=self.node_id,
                    loop_node_type=self.node_type,
                    loop_node_data=self.node_data,
                    start_at=start_at,
                    inputs=inputs,
                    steps=current_index,
                    metadata={
                        WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: graph_engine.graph_runtime_state.total_tokens,
                        "completed_reason": "error",
                    },
                    error=event.error,
                )
                yield RunCompletedEvent(
                    run_result=NodeRunResult(
                        status=WorkflowNodeExecutionStatus.FAILED,
                        error=event.error,
                        metadata={
                            WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: graph_engine.graph_runtime_state.total_tokens
                        },
                    )
                )
                return {"check_break_result": True}
            else:
                yield self._handle_event_metadata(event=cast(InNodeEvent, event), iter_run_index=current_index)

        # Remove all nodes outputs from variable pool
        for node_id in loop_graph.node_ids:
            variable_pool.remove([node_id])

        _outputs = {}
        for loop_variable_key, loop_variable_selector in loop_variable_selectors.items():
            _loop_variable_segment = variable_pool.get(loop_variable_selector)
            if _loop_variable_segment:
                _outputs[loop_variable_key] = _loop_variable_segment.value
            else:
                _outputs[loop_variable_key] = None

        _outputs["loop_round"] = current_index + 1
        self.node_data.outputs = _outputs

        if check_break_result:
            return {"check_break_result": True}

        # Move to next loop
        next_index = current_index + 1
        variable_pool.add([self.node_id, "index"], next_index)

        yield LoopRunNextEvent(
            loop_id=self.id,
            loop_node_id=self.node_id,
            loop_node_type=self.node_type,
            loop_node_data=self.node_data,
            index=next_index,
            pre_loop_output=self.node_data.outputs,
        )

        return {"check_break_result": False}>)

循环条件处理

循环的中断条件通过 ConditionProcessor 类处理:

class ConditionProcessor:
    """条件处理器,用于处理循环中断条件"""

    def process_conditions(  
    self,  
    *,  
    variable_pool: VariablePool,  
    conditions: Sequence[Condition],  
    operator: Literal["and", "or"],  
 ):
     input_conditions = []  
     group_results = []  
   
     for condition in conditions:  
         variable = variable_pool.get(condition.variable_selector)  
         if variable isNone:  
             raise ValueError(f"Variable {condition.variable_selector} not found")  
   
         if isinstance(variable, ArrayFileSegment) and condition.comparison_operator in {  
             "contains",  
             "not contains",  
             "all of",  
         }:  
             # check sub conditions  
             ifnot condition.sub_variable_condition:  
                 raise ValueError("Sub variable is required")  
             result = _process_sub_conditions(  
                 variable=variable,  
                 sub_conditions=condition.sub_variable_condition.conditions,  
                 operator=condition.sub_variable_condition.logical_operator,  
             )  
         elif condition.comparison_operator in {  
             "exists",  
             "not exists",  
         }:  
             result = _evaluate_condition(  
                 value=variable.value,  
                 operator=condition.comparison_operator,  
                 expected=None,  
             )  
         else:  
             actual_value = variable.value if variable elseNone
             expected_value = condition.value  
             if isinstance(expected_value, str):  
                 expected_value = variable_pool.convert_template(expected_value).text  
             input_conditions.append(  
                 {  
                     "actual_value": actual_value,  
                     "expected_value": expected_value,  
                     "comparison_operator": condition.comparison_operator,  
                 }  
             )  
             result = _evaluate_condition(  
                 value=actual_value,  
                 operator=condition.comparison_operator,  
                 expected=expected_value,  
             )  
         group_results.append(result)  
         # Implemented short-circuit evaluation for logical conditions  
         if (operator == "and"andnot result) or (operator == "or"and result):  
             final_result = result  
             return input_conditions, group_results, final_result  
   
     final_result = all(group_results) if operator == "and"else any(group_results)  
     return input_conditions, group_results, final_result

五、迭代机制详解

迭代节点结构

迭代节点(IterationNode)用于对集合中的每个元素执行相同的操作:

class IterationNodeData(BaseIterationNodeData):
    iterator_selector: list[str]      # 迭代器选择器
    output_selector: list[str]        # 输出选择器
    is_parallel: bool = False         # 是否并行执行
    parallel_nums: int = 10           # 并行数量
    error_handle_mode: ErrorHandleMode = ErrorHandleMode.TERMINATED

迭代执行流程

图片

迭代实现核心代码分析

1. 迭代主执行方法
api/core/workflow/nodes/iteration/iteration_node.py
def _run(self) -> Generator[NodeEvent | InNodeEvent, None, None]:
    """  
 Run the node.  
 """
 variable = self.graph_runtime_state.variable_pool.get(self.node_data.iterator_selector)  
   
ifnot variable:  
     raise IteratorVariableNotFoundError(f"iterator variable {self.node_data.iterator_selector} not found")  
   
ifnot isinstance(variable, ArrayVariable) andnot isinstance(variable, NoneVariable):  
     raise InvalidIteratorValueError(f"invalid iterator value: {variable}, please provide a list.")  
   
if isinstance(variable, NoneVariable) or len(variable.value) == 0:  
     yield RunCompletedEvent(  
         run_result=NodeRunResult(  
             status=WorkflowNodeExecutionStatus.SUCCEEDED,  
             outputs={"output": []},  
         )  
     )  
     return
   
 iterator_list_value = variable.to_object()  
   
ifnot isinstance(iterator_list_value, list):  
     raise InvalidIteratorValueError(f"Invalid iterator value: {iterator_list_value}, please provide a list.")  
   
 inputs = {"iterator_selector": iterator_list_value}  
   
 graph_config = self.graph_config  
   
ifnot self.node_data.start_node_id:  
     raise StartNodeIdNotFoundError(f"field start_node_id in iteration {self.node_id} not found")  
   
 root_node_id = self.node_data.start_node_id  
   
# init graph  
 iteration_graph = Graph.init(graph_config=graph_config, root_node_id=root_node_id)  
   
ifnot iteration_graph:  
     raise IterationGraphNotFoundError("iteration graph not found")  
   
 variable_pool = self.graph_runtime_state.variable_pool  
   
# append iteration variable (item, index) to variable pool  
 variable_pool.add([self.node_id, "index"], 0)  
 variable_pool.add([self.node_id, "item"], iterator_list_value[0])  
   
# init graph engine  
from core.workflow.graph_engine.graph_engine import GraphEngine, GraphEngineThreadPool  
   
 graph_engine = GraphEngine(  
     tenant_id=self.tenant_id,  
     app_id=self.app_id,  
     workflow_type=self.workflow_type,  
     workflow_id=self.workflow_id,  
     user_id=self.user_id,  
     user_from=self.user_from,  
     invoke_from=self.invoke_from,  
     call_depth=self.workflow_call_depth,  
     graph=iteration_graph,  
     graph_config=graph_config,  
     variable_pool=variable_pool,  
     max_execution_steps=dify_config.WORKFLOW_MAX_EXECUTION_STEPS,  
     max_execution_time=dify_config.WORKFLOW_MAX_EXECUTION_TIME,  
     thread_pool_id=self.thread_pool_id,  
 )  
   
 start_at = datetime.now(UTC).replace(tzinfo=None)  
   
yield IterationRunStartedEvent(  
     iteration_id=self.id,  
     iteration_node_id=self.node_id,  
     iteration_node_type=self.node_type,  
     iteration_node_data=self.node_data,  
     start_at=start_at,  
     inputs=inputs,  
     metadata={"iterator_length": len(iterator_list_value)},  
     predecessor_node_id=self.previous_node_id,  
 )  
   
yield IterationRunNextEvent(  
     iteration_id=self.id,  
     iteration_node_id=self.node_id,  
     iteration_node_type=self.node_type,  
     iteration_node_data=self.node_data,  
     index=0,  
     pre_iteration_output=None,  
     duration=None,  
 )  
 iter_run_map: dict[str, float] = {}  
 outputs: list[Any] = [None] * len(iterator_list_value)  
try:  
     if self.node_data.is_parallel:  
         futures: list[Future] = []  
         q: Queue = Queue()  
         thread_pool = GraphEngineThreadPool(  
             max_workers=self.node_data.parallel_nums, max_submit_count=dify_config.MAX_SUBMIT_COUNT  
         )  
         for index, item in enumerate(iterator_list_value):  
             future: Future = thread_pool.submit(  
                 self._run_single_iter_parallel,  
                 flask_app=current_app._get_current_object(),  # type: ignore  
                 q=q,  
                 context=contextvars.copy_context(),  
                 iterator_list_value=iterator_list_value,  
                 inputs=inputs,  
                 outputs=outputs,  
                 start_at=start_at,  
                 graph_engine=graph_engine,  
                 iteration_graph=iteration_graph,  
                 index=index,  
                 item=item,  
                 iter_run_map=iter_run_map,  
             )  
             future.add_done_callback(thread_pool.task_done_callback)  
             futures.append(future)  
         succeeded_count = 0
         whileTrue:  
             try:  
                 event = q.get(timeout=1)  
                 if event isNone:  
                     break
                 if isinstance(event, IterationRunNextEvent):  
                     succeeded_count += 1
                     if succeeded_count == len(futures):  
                         q.put(None)  
                 yield event  
                 if isinstance(event, RunCompletedEvent):  
                     q.put(None)  
                     for f in futures:  
                         ifnot f.done():  
                             f.cancel()  
                     yield event  
                 if isinstance(event, IterationRunFailedEvent):  
                     q.put(None)  
                     yield event  
             except Empty:  
                 continue
   
         # wait all threads  
         wait(futures)  
     else:  
         for _ in range(len(iterator_list_value)):  
             yieldfrom self._run_single_iter(  
                 iterator_list_value=iterator_list_value,  
                 variable_pool=variable_pool,  
                 inputs=inputs,  
                 outputs=outputs,  
                 start_at=start_at,  
                 graph_engine=graph_engine,  
                 iteration_graph=iteration_graph,  
                 iter_run_map=iter_run_map,  
             )  
     if self.node_data.error_handle_mode == ErrorHandleMode.REMOVE_ABNORMAL_OUTPUT:  
         outputs = [output for output in outputs if output isnotNone]  
   
     # Flatten the list of lists  
     if isinstance(outputs, list) and all(isinstance(output, list) for output in outputs):  
         outputs = [item for sublist in outputs for item in sublist]  
   
     yield IterationRunSucceededEvent(  
         iteration_id=self.id,  
         iteration_node_id=self.node_id,  
         iteration_node_type=self.node_type,  
         iteration_node_data=self.node_data,  
         start_at=start_at,  
         inputs=inputs,  
         outputs={"output": outputs},  
         steps=len(iterator_list_value),  
         metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},  
     )  
   
     yield RunCompletedEvent(  
         run_result=NodeRunResult(  
             status=WorkflowNodeExecutionStatus.SUCCEEDED,  
             outputs={"output": outputs},  
             metadata={  
                 WorkflowNodeExecutionMetadataKey.ITERATION_DURATION_MAP: iter_run_map,  
                 WorkflowNodeExecutionMetadataKey.TOTAL_TOKENS: graph_engine.graph_runtime_state.total_tokens,  
             },  
         )  
     )  
except IterationNodeError as e:  
     # iteration run failed  
     logger.warning("Iteration run failed")  
     yield IterationRunFailedEvent(  
         iteration_id=self.id,  
         iteration_node_id=self.node_id,  
         iteration_node_type=self.node_type,  
         iteration_node_data=self.node_data,  
         start_at=start_at,  
         inputs=inputs,  
         outputs={"output": outputs},  
         steps=len(iterator_list_value),  
         metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},  
         error=str(e),  
     )  
   
     yield RunCompletedEvent(  
         run_result=NodeRunResult(  
             status=WorkflowNodeExecutionStatus.FAILED,  
             error=str(e),  
         )  
     )  
finally:  
     # remove iteration variable (item, index) from variable pool after iteration run completed  
     variable_pool.remove([self.node_id, "index"])  
     variable_pool.remove([self.node_id, "item"])
2. 串行迭代执行
def _run_single_iter(
    self,  
    *,  
    iterator_list_value: Sequence[str],  
    variable_pool: VariablePool,  
    inputs: Mapping[str, list],  
    outputs: list,  
    start_at: datetime,  
    graph_engine: "GraphEngine",  
    iteration_graph: Graph,  
    iter_run_map: dict[str, float],  
    parallel_mode_run_id: Optional[str] = None,  
) -> Generator[NodeEvent | InNodeEvent, None, None]:  
    """  
    run single iteration    
    """    
    iter_start_at = datetime.now(UTC).replace(tzinfo=None)  

    try:  
        rst = graph_engine.run()  
        # get current iteration index  
        index_variable = variable_pool.get([self.node_id, "index"])  
        ifnot isinstance(index_variable, IntegerVariable):  
            raise IterationIndexNotFoundError(f"iteration {self.node_id} current index not found")  
        current_index = index_variable.value  
        iteration_run_id = parallel_mode_run_id if parallel_mode_run_id isnotNoneelsef"{current_index}"
        next_index = int(current_index) + 1
        for event in rst:  
            if isinstance(event, (BaseNodeEvent | BaseParallelBranchEvent)) andnot event.in_iteration_id:  
                event.in_iteration_id = self.node_id  

            if (  
                isinstance(event, BaseNodeEvent)  
                and event.node_type == NodeType.ITERATION_START  
                andnot isinstance(event, NodeRunStreamChunkEvent)  
            ):  
                continue

            if isinstance(event, NodeRunSucceededEvent):  
                yield self._handle_event_metadata(  
                    event=event, iter_run_index=current_index, parallel_mode_run_id=parallel_mode_run_id  
                )  
            elif isinstance(event, BaseGraphEvent):  
                if isinstance(event, GraphRunFailedEvent):  
                    # iteration run failed  
                    if self.node_data.is_parallel:  
                        yield IterationRunFailedEvent(  
                            iteration_id=self.id,  
                            iteration_node_id=self.node_id,  
                            iteration_node_type=self.node_type,  
                            iteration_node_data=self.node_data,  
                            parallel_mode_run_id=parallel_mode_run_id,  
                            start_at=start_at,  
                            inputs=inputs,  
                            outputs={"output": outputs},  
                            steps=len(iterator_list_value),  
                            metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},  
                            error=event.error,  
                        )  
                    else:  
                        yield IterationRunFailedEvent(  
                            iteration_id=self.id,  
                            iteration_node_id=self.node_id,  
                            iteration_node_type=self.node_type,  
                            iteration_node_data=self.node_data,  
                            start_at=start_at,  
                            inputs=inputs,  
                            outputs={"output": outputs},  
                            steps=len(iterator_list_value),  
                            metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},  
                            error=event.error,  
                        )  
                    yield RunCompletedEvent(  
                        run_result=NodeRunResult(  
                            status=WorkflowNodeExecutionStatus.FAILED,  
                            error=event.error,  
                        )  
                    )  
                    return
            elif isinstance(event, InNodeEvent):  
                # event = cast(InNodeEvent, event)  
                metadata_event = self._handle_event_metadata(  
                    event=event, iter_run_index=current_index, parallel_mode_run_id=parallel_mode_run_id  
                )  
                if isinstance(event, NodeRunFailedEvent):  
                    if self.node_data.error_handle_mode == ErrorHandleMode.CONTINUE_ON_ERROR:  
                        yield NodeInIterationFailedEvent(  
                            **metadata_event.model_dump(),  
                        )  
                        outputs[current_index] = None
                        variable_pool.add([self.node_id, "index"], next_index)  
                        if next_index < len(iterator_list_value):  
                            variable_pool.add([self.node_id, "item"], iterator_list_value[next_index])  
                        duration = (datetime.now(UTC).replace(tzinfo=None) - iter_start_at).total_seconds()  
                        iter_run_map[iteration_run_id] = duration  
                        yield IterationRunNextEvent(  
                            iteration_id=self.id,  
                            iteration_node_id=self.node_id,  
                            iteration_node_type=self.node_type,  
                            iteration_node_data=self.node_data,  
                            index=next_index,  
                            parallel_mode_run_id=parallel_mode_run_id,  
                            pre_iteration_output=None,  
                            duration=duration,  
                        )  
                        return
                    elif self.node_data.error_handle_mode == ErrorHandleMode.REMOVE_ABNORMAL_OUTPUT:  
                        yield NodeInIterationFailedEvent(  
                            **metadata_event.model_dump(),  
                        )  
                        variable_pool.add([self.node_id, "index"], next_index)  

                        if next_index < len(iterator_list_value):  
                            variable_pool.add([self.node_id, "item"], iterator_list_value[next_index])  
                        duration = (datetime.now(UTC).replace(tzinfo=None) - iter_start_at).total_seconds()  
                        iter_run_map[iteration_run_id] = duration  
                        yield IterationRunNextEvent(  
                            iteration_id=self.id,  
                            iteration_node_id=self.node_id,  
                            iteration_node_type=self.node_type,  
                            iteration_node_data=self.node_data,  
                            index=next_index,  
                            parallel_mode_run_id=parallel_mode_run_id,  
                            pre_iteration_output=None,  
                            duration=duration,  
                        )  
                        return
                    elif self.node_data.error_handle_mode == ErrorHandleMode.TERMINATED:  
                        yield IterationRunFailedEvent(  
                            iteration_id=self.id,  
                            iteration_node_id=self.node_id,  
                            iteration_node_type=self.node_type,  
                            iteration_node_data=self.node_data,  
                            start_at=start_at,  
                            inputs=inputs,  
                            outputs={"output": None},  
                            steps=len(iterator_list_value),  
                            metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},  
                            error=event.error,  
                        )  
                yield metadata_event  

        current_output_segment = variable_pool.get(self.node_data.output_selector)  
        if current_output_segment isNone:  
            raise IterationNodeError("iteration output selector not found")  
        current_iteration_output = current_output_segment.value  
        outputs[current_index] = current_iteration_output  
        # remove all nodes outputs from variable pool  
        for node_id in iteration_graph.node_ids:  
            variable_pool.remove([node_id])  

        # move to next iteration  
        variable_pool.add([self.node_id, "index"], next_index)  

        if next_index < len(iterator_list_value):  
            variable_pool.add([self.node_id, "item"], iterator_list_value[next_index])  
        duration = (datetime.now(UTC).replace(tzinfo=None) - iter_start_at).total_seconds()  
        iter_run_map[iteration_run_id] = duration  
        yield IterationRunNextEvent(  
            iteration_id=self.id,  
            iteration_node_id=self.node_id,  
            iteration_node_type=self.node_type,  
            iteration_node_data=self.node_data,  
            index=next_index,  
            parallel_mode_run_id=parallel_mode_run_id,  
            pre_iteration_output=current_iteration_output orNone,  
            duration=duration,  
        )  

    except IterationNodeError as e:  
        logger.warning(f"Iteration run failed:{str(e)}")  
        yield IterationRunFailedEvent(  
            iteration_id=self.id,  
            iteration_node_id=self.node_id,  
            iteration_node_type=self.node_type,  
            iteration_node_data=self.node_data,  
            start_at=start_at,  
            inputs=inputs,  
            outputs={"output": None},  
            steps=len(iterator_list_value),  
            metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},  
            error=str(e),  
        )  
        yield RunCompletedEvent(  
            run_result=NodeRunResult(  
                status=WorkflowNodeExecutionStatus.FAILED,  
                error=str(e),  
            )  
        )
3. 并行迭代执行
def _run_single_iter_parallel(  
    self,  
    *,  
    flask_app: Flask,  
    context: contextvars.Context,  
    q: Queue,  
    iterator_list_value: Sequence[str],  
    inputs: Mapping[str, list],  
    outputs: list,  
    start_at: datetime,  
    graph_engine: "GraphEngine",  
    iteration_graph: Graph,  
    index: int,  
    item: Any,  
    iter_run_map: dict[str, float],  
):  
    """  
    run single iteration in parallel mode    """    for var, val in context.items():  
        var.set(val)  

    # FIXME(-LAN-): Save current user before entering new app context  
    from flask import g  

    saved_user = None
    if has_request_context() and hasattr(g, "_login_user"):  
        saved_user = g._login_user  

    with flask_app.app_context():  
        # Restore user in new app context  
        if saved_user isnotNone:  
            from flask import g  

            g._login_user = saved_user  

        parallel_mode_run_id = uuid.uuid4().hex  
        graph_engine_copy = graph_engine.create_copy()  
        variable_pool_copy = graph_engine_copy.graph_runtime_state.variable_pool  
        variable_pool_copy.add([self.node_id, "index"], index)  
        variable_pool_copy.add([self.node_id, "item"], item)  
        for event in self._run_single_iter(  
            iterator_list_value=iterator_list_value,  
            variable_pool=variable_pool_copy,  
            inputs=inputs,  
            outputs=outputs,  
            start_at=start_at,  
            graph_engine=graph_engine_copy,  
            iteration_graph=iteration_graph,  
            iter_run_map=iter_run_map,  
            parallel_mode_run_id=parallel_mode_run_id,  
        ):  
            q.put(event)  
        graph_engine.graph_runtime_state.total_tokens += graph_engine_copy.graph_runtime_state.total_tokens

错误处理模式

迭代节点支持三种错误处理模式:

class ErrorHandleMode(StrEnum):
    TERMINATED = "terminated"              # 遇到错误立即终止
    CONTINUE_ON_ERROR = "continue-on-error"  # 遇到错误继续执行
    REMOVE_ABNORMAL_OUTPUT = "remove-abnormal-output"  # 移除异常输出

六、事件系统

事件类型层次结构

图片

事件触发时机

图片

七、变量池管理

变量池结构

变量池是工作流中的数据中心,负责存储和管理所有变量:

class VariablePool(BaseModel):
    # 变量字典:第一级键是节点ID,第二级是变量的哈希值
    variable_dictionary: dict[str, dict[int, Segment]] = Field(
        description="Variables mapping",
        default=defaultdict(dict),
    )
    
    # 用户输入变量
    user_inputs: Mapping[str, Any] = Field(
        description="User inputs",
    )
    
    # 系统变量
    system_variables: Mapping[SystemVariableKey, Any] = Field(
        description="System variables",
    )
    
    # 环境变量
    environment_variables: Sequence[Variable] = Field(
        description="Environment variables.",
        default_factory=list,
    )
    
    # 会话变量
    conversation_variables: Sequence[Variable] = Field(
        description="Conversation variables.",
        default_factory=list,
    )

变量作用域管理

图片

变量操作方法

def add(self, selector: Sequence[str], value: Any, /) -> None:
    """
    向变量池添加变量。

    Args:
        selector (Sequence[str]): 变量选择器。
        value (VariableValue): 变量值。

    Raises:
        ValueError: 如果选择器无效。

    Returns:
        None
    """
    if len(selector) < 2:
        raise ValueError("Invalid selector")

    if isinstance(value, Variable):
        variable = value
    if isinstance(value, Segment):
        variable = variable_factory.segment_to_variable(segment=value, selector=selector)
    else:
        segment = variable_factory.build_segment(value)
        variable = variable_factory.segment_to_variable(segment=segment, selector=selector)

    hash_key = hash(tuple(selector[1:]))
    self.variable_dictionary[selector[0]][hash_key] = variable

def get(self, selector: Sequence[str], /) -> Segment | None:
    """
    根据选择器从变量池中检索值。

    Args:
        selector (Sequence[str]): 用于标识变量的选择器。

    Returns:
        Any: 与给定选择器关联的值。

    Raises:
        ValueError: 如果选择器无效。
    """
    if len(selector) < 2:
        returnNone

    hash_key = hash(tuple(selector[1:]))
    value = self.variable_dictionary[selector[0]].get(hash_key)

    if value isNone:
        selector, attr = selector[:-1], selector[-1]
        # Python support `attr in FileAttribute` after 3.12
        if attr notin {item.value for item in FileAttribute}:
            returnNone
        value = self.get(selector)
        ifnot isinstance(value, FileSegment | NoneSegment):
            returnNone
        if isinstance(value, FileSegment):
            attr = FileAttribute(attr)
            attr_value = file_manager.get_attr(file=value.value, attr=attr)
            return variable_factory.build_segment(attr_value)
        return value

    return value

def remove(self, selector: Sequence[str], /):
    """
    根据选择器从变量池中移除变量。

    Args:
        selector (Sequence[str]): 表示选择器的字符串序列。

    Returns:
        None
    """
    ifnot selector:
        return
    if len(selector) == 1:
        self.variable_dictionary[selector[0]] = {}
        return
    hash_key = hash(tuple(selector[1:]))
    self.variable_dictionary[selector[0]].pop(hash_key, None)

八、实际应用场景

场景1:批量数据处理

需求:对用户上传的多个文档进行 AI 分析

图片

配置示例

{
  "type": "iteration",
  "config": {
    "iterator_selector": ["start", "documents"],
    "is_parallel": true,
    "parallel_nums": 5,
    "error_handle_mode": "continue-on-error"
  }
}

场景2:条件循环处理

需求:持续调用 API 直到获得满意的结果

图片

配置示例

{
  "type": "loop",
"config": {
    "loop_variable_selectors": {
      "loop.attempt": ["start", "initial_attempt"]
    },
    "break_conditions": [
      {
        "variable_selector": ["api_result", "quality_score"],
        "comparison_operator": ">",
        "value": 0.8
      }
    ],
    "logical_operator": "or"
  }
}

场景3:嵌套循环处理

需求:对多个用户的多个问题进行批量回答

图片

九、总结

Dify 的循环和迭代机制是其工作流引擎的核心功能之一,通过以下关键特性实现了强大的数据处理能力:

核心优势

  1. 灵活的控制结构

    • 支持条件循环和集合迭代
    • 提供丰富的退出条件和错误处理模式
    • 支持嵌套循环和迭代
  2. 高性能执行

    • 并行迭代支持
    • 智能变量池管理
    • 事件驱动的异步执行
  3. 完善的监控体系

    • 详细的事件系统
    • 实时状态跟踪
    • 丰富的调试信息
  4. 易于使用

    • 直观的配置方式
    • 清晰的错误提示
    • 完善的文档支持

技术亮点

  • 事件驱动架构:通过事件系统实现松耦合的组件通信
  • 变量作用域管理:精确控制变量的生命周期和访问范围
  • 条件处理引擎:支持复杂的逻辑条件判断
  • 并发执行优化:智能的线程池管理和资源调度

通过深入理解 Dify 的循环和迭代机制,我们可以更好地设计和优化 AI 应用的工作流,提高处理效率和用户体验,是值得学习和借鉴的优秀设计。

最后

为什么要学AI大模型

当下,⼈⼯智能市场迎来了爆发期,并逐渐进⼊以⼈⼯通⽤智能(AGI)为主导的新时代。企业纷纷官宣“ AI+ ”战略,为新兴技术⼈才创造丰富的就业机会,⼈才缺⼝将达 400 万!

DeepSeek问世以来,生成式AI和大模型技术爆发式增长,让很多岗位重新成了炙手可热的新星,岗位薪资远超很多后端岗位,在程序员中稳居前列。

在这里插入图片描述

与此同时AI与各行各业深度融合,飞速发展,成为炙手可热的新风口,企业非常需要了解AI、懂AI、会用AI的员工,纷纷开出高薪招聘AI大模型相关岗位。
在这里插入图片描述
最近很多程序员朋友都已经学习或者准备学习 AI 大模型,后台也经常会有小伙伴咨询学习路线和学习资料,我特别拜托北京清华大学学士和美国加州理工学院博士学位的鲁为民老师给大家这里给大家准备了一份涵盖了AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频 全系列的学习资料,这些学习资料不仅深入浅出,而且非常实用,让大家系统而高效地掌握AI大模型的各个知识点。

这份完整版的大模型 AI 学习资料已经上传优快云,朋友们如果需要可以微信扫描下方优快云官方认证二维码免费领取【保证100%免费

AI大模型系统学习路线

在面对AI大模型开发领域的复杂与深入,精准学习显得尤为重要。一份系统的技术路线图,不仅能够帮助开发者清晰地了解从入门到精通所需掌握的知识点,还能提供一条高效、有序的学习路径。

img

但知道是一回事,做又是另一回事,初学者最常遇到的问题主要是理论知识缺乏、资源和工具的限制、模型理解和调试的复杂性,在这基础上,找到高质量的学习资源,不浪费时间、不走弯路,又是重中之重。

AI大模型入门到实战的视频教程+项目包

看视频学习是一种高效、直观、灵活且富有吸引力的学习方式,可以更直观地展示过程,能有效提升学习兴趣和理解力,是现在获取知识的重要途径

在这里插入图片描述
光学理论是没用的,要学会跟着一起敲,要动手实操,才能将自己的所学运用到实际当中去,这时候可以搞点实战案例来学习。
在这里插入图片描述

海量AI大模型必读的经典书籍(PDF)

阅读AI大模型经典书籍可以帮助读者提高技术水平,开拓视野,掌握核心技术,提高解决问题的能力,同时也可以借鉴他人的经验。对于想要深入学习AI大模型开发的读者来说,阅读经典书籍是非常有必要的。
在这里插入图片描述

600+AI大模型报告(实时更新)

这套包含640份报告的合集,涵盖了AI大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师,还是对AI大模型感兴趣的爱好者,这套报告合集都将为您提供宝贵的信息和启示。
在这里插入图片描述

AI大模型面试真题+答案解析

我们学习AI大模型必然是想找到高薪的工作,下面这些面试题都是总结当前最新、最热、最高频的面试题,并且每道题都有详细的答案,面试前刷完这套面试题资料,小小offer,不在话下
在这里插入图片描述

在这里插入图片描述

这份完整版的大模型 AI 学习资料已经上传优快云,朋友们如果需要可以微信扫描下方优快云官方认证二维码免费领取【保证100%免费

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值