Ray 源码分析系列(15)—Ray Dag_ray dag multiout-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_43956669/article/details/144976730

前言

官方文档关于dag几乎没有多少资料，只有使用示例。另外，官博上提到了 dag.experimental_compile 能够提升20%训练吞吐，并且以更低的开发成本实现不同并行策略，所以好奇地分析一下dag和compiled graph有什么可以学习的地方吧。

在这里插入图片描述

Dag关键特性

Lazy Computation Graphs：懒计算模式，即可以等所有task/actor定义完之后再执行，方便做图优化
Custom Input Node: 支持数据变但计算图不变，避免重复建图
Multiple Output Node: 计算图不变，但支持多输出(不清楚内部是并行执行两个graph还是batch 模式)
Reuse Ray Actors in DAGs：通过调用.remote() ，避免actor在graph执行完成后被销毁

Dag使用示例

import ray
from ray.dag.input_node import InputNode
from ray.dag.output_node import MultiOutputNode

@ray.remote
class Worker:
    def __init__(self):
        self.forwarded = 0

    def forward(self, input_data: int):
        self.forwarded += 1
        return input_data + 1

    def num_forwarded(self):
        return self.forwarded

# Create an actor via ``remote`` API not ``bind`` API to avoid
# killing actors when a DAG is finished.
worker = Worker.remote()

with InputNode() as input_data:
    dag = MultiOutputNode([worker.forward.bind(input_data)])

# Actors are reused. The DAG definition doesn't include
# actor creation.
assert ray.get(dag.execute(1)) == [2]
assert ray.get(dag.execute(2)) == [3]
assert ray.get(dag.execute(3)) == [4]

# You can still use other actor methods via `remote` API.
assert ray.get(worker.num_forwarded.remote()) == 3

Ray Compiled Graph

Ray Compiled Graph is currently at a developer preview stage. The APIs are subject to change and expected to evolve. The API is available from Ray 2.32.

Why Compiled Graph

REP中也有提到，设计目标是：

task overhead（来自rpc的损耗和dynamic memory allocation）下降到数十微秒级别，当前是1ms左右
支持gpu 通信原语，当前只支持cpu（很难利用上rdma和nccl的技术）

而compile graph 真正的key idea在于，缩短control plane 的overhead。而要实现这一点，就需要知道某个DAG pattern会被重复的使用。当我们知道这些pattern以后，在local node上的通信就可以通过shared-memory来执行了。

在这里插入图片描述

Compiled Graph 示例

在这里插入图片描述

关键特性

Ray Compiled Graph 的静态执行模型及其相比经典 Ray APIs 的有一定优势，基于静态特性实现的一系列优化。

执行模型差异
- Ray Compiled Graph：采用静态执行模型，意味着在执行前，整个计算图的结构和执行计划是确定的。这种模型下，计算图在编译阶段就可以进行各种分析和优化，从而为后续执行做好充分准备。
- 经典 Ray APIs：是 eager 模式，即任务会立即被调度执行，每次调用 .remote() 方法就会立即启动任务。这种模式简单直接，但在资源管理和优化方面相对缺乏灵活性。
基于静态特性的优化
1. 预分配资源：通过预分配资源，Ray Compiled Graph 可以提前规划和准备所需的计算资源，如 CP