Apache DolphinScheduler-1.3.9源码分析（一）

最新推荐文章于 2025-07-09 19:11:58 发布

原创

最新推荐文章于 2025-07-09 19:11:58 发布 · 1.3k 阅读

30 ·

CC 4.0 BY-SA版权

文章标签：

#大数据

引言

随着大数据的发展，任务调度系统成为了数据处理和管理中至关重要的部分。Apache DolphinScheduler 是一款优秀的开源分布式工作流调度平台，在大数据场景中得到广泛应用。

在本文中，我们将对 Apache DolphinScheduler 1.3.9 版本的源码进行深入分析，介绍 Master 启动以及调度流程。

通过这些分析，开发者可以更好地理解 DolphinScheduler 的工作机制，并在实际使用中更高效地进行二次开发或优化。

Master Server启动

启动流程图

Master调度工作流流程图

MasterServer启动方法

public void run() {
    // init remoting server
    NettyServerConfig serverConfig = new NettyServerConfig();
    serverConfig.setListenPort(masterConfig.getListenPort());
    this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE, new TaskResponseProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_ACK, new TaskAckProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_RESPONSE, new TaskKillResponseProcessor());
    this.nettyRemotingServer.start();

    // self tolerant
    this.zkMasterClient.start();
    this.zkMasterClient.setStoppable(this);

    // scheduler start
    this.masterSchedulerService.start();

    // start QuartzExecutors
    // what system should do if exception
    try {
        logger.info("start Quartz server...");
        QuartzExecutors.getInstance().start();
    } catch (Exception e) {
        try {
            QuartzExecutors.getInstance().shutdown();
        } catch (SchedulerException e1) {
            logger.error("QuartzExecutors shutdown failed : " + e1.getMessage(), e1);
        }
        logger.error("start Quartz failed", e);
    }

    /**
     * register hooks, which are called before the process exits
     */
    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        if (Stopper.isRunning()) {
            close("shutdownHook");
        }
    }));

}

nettyServer会注册三种Command

TASK_EXECUTE_ACK：Worker在接收到Master执行任务的请求后，会给Master发送一条Ack Command，告诉Master已经开始执行Task了。
TASK_EXECUTE_RESPONSE：Worker在执行完Task之后，会给Master发送一条Response Command，告诉Master任务调度/执行结果。
TASK_KILL_RESPONSE：Master接收到Task停止的请求会，会给Worker发送TASK_KILL_REQUEST Command，之后Worker会把Task_KILL_RESPONSE Command返回给Master。

启动调度和定时器。
添加ShutdownHook，关闭资源。

Master 配置文件

master.listen.port=5678

# 限制Process Instance并发调度的线程数
master.exec.threads=100

# 限制每个ProcessInstance可以执行的任务数
master.exec.task.num=20

# 每一批次可以分发的任务数
master.dispatch.task.num=3

# master需要选择一个稳定的worker去执行任务
# 算法有：Random，RoundRobin，LowerWeight。默认是LowerWeight
master.host.selector=LowerWeight

# master需要向Zookeeper发送心跳，单位：秒
master.heartbeat.interval=10

# master提交任务失败，重试次数
master.task.commit.retryTimes=5

# master提交任务失败，重试时间间隔
master.task.commit.interval=1000

# master最大cpu平均负载，只有当系统cpu平均负载还没有达到这个值，master才能调度任务
# 默认值为-1，系统cpu核数 * 2
master.max.cpuload.avg=-1

# master为其他进程保留内存，只有当系统可用内存大于这个值，master才能调度
# 默认值0.3G
master.reserved.memory=0.3

Master Scheduler启动

MasterSchedulerService初始化方法

public void init(){
    // masterConfig.getMasterExecThreads()，master.properties里master.exec.threads=100
    // 该线程池的核心线程数和最大线程数为100
    this.masterExecService = (ThreadPoolExecutor)ThreadUtils.newDaemonFixedThreadExecutor("Master-Exec-Thread", masterConfig.getMasterExecThreads());
    NettyClientConfig clientConfig = new NettyClientConfig();
    this.nettyRemotingClient = new NettyRemotingClient(clientConfig);
}

MasterSchedulerService启动方法

public void run() {
    logger.info("master scheduler started");
    while (Stopper.isRunning()){
        try {
            // 这个方法是用来检查master cpu load和memory，判断master是否还有资源进行调度
            // 如果不能调度，Sleep 1 秒种
            boolean runCheckFlag = OSUtils.checkResource(masterConfig.getMasterMaxCpuloadAvg(), masterConfig.getMasterReservedMemory());
            if(!runCheckFlag) {
                Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                continue;
            }
            if (zkMasterClient.getZkClient().getState() == CuratorFrameworkState.STARTED) {
                // 这里才是真正去执行调度的方法
                scheduleProcess();
            }
        } catch (Exception e) {
            logger.error("master scheduler thread error", e);
        }
    }
}

MasterSchedulerService调度方法

private void scheduleProcess() throws Exception {
    InterProcessMutex mutex = null;
    try {
        // 阻塞式获取分布式锁
        mutex = zkMasterClient.blockAcquireMutex();
        // 获取线程池的活跃线程数
        int activeCount = masterExecService.getActiveCount();
        // make sure to scan and delete command  table in one transaction
        // 获取其中一个command，必须保证操作都在一个事务里
        Command command = processService.findOneCommand();
        if (command != null) {
            logger.info("find one command: id: {}, type: {}", command.getId(),command.getCommandType());

            try{
                // 获取ProcessInstance，
                // 这个方法会根据master.exec.threads配置和活跃线程数来判断是否可以调度processInstance
                ProcessInstance processInstance = processService.handleCommand(logger,
                        getLocalAddress(),
                        this.masterConfig.getMasterExecThreads() - activeCount, command);
                if (processInstance != null) {
                    logger.info("start master exec thread , split DAG ...");
                    masterExecService.execute(
                    new MasterExecThread(
                            processInstance
                            , processService
                            , nettyRemotingClient
                            ));
                }
            }catch (Exception e){
                logger.error("scan command error ", e);
                processService.moveToErrorCommand(command, e.toString());
            }
        } else{
            //indicate that no command ,sleep for 1s
            Thread.sleep(Constants.SLEEP_TIME_MILLIS);
        }
    } finally{
        // 释放分布式锁
        zkMasterClient.releaseMutex(mutex);