Apache DolphinScheduler-1.3.9源码分析(二)

引言

随着大数据的发展,任务调度系统成为了数据处理和管理中至关重要的部分。Apache DolphinScheduler 是一款优秀的开源分布式工作流调度平台,在大数据场景中得到广泛应用。

在本文中,我们将对 Apache DolphinScheduler 1.3.9 版本的源码进行深入分析,主要分析一下Master和Worker的交互设计。

感兴趣的朋友也可以回顾我们上一篇文章:Apache DolphinScheduler-1.3.9源码分析(一)

Worker配置文件

# worker listener port
worker.listen.port=1234

# worker execute thread number to limit task instances in parallel
# worker可并行的任务数限制
worker.exec.threads=100

# worker heartbeat interval, the unit is second
# worker发送心跳间隔
worker.heartbeat.interval=10

# worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2
# worker最大cpu平均负载,只有系统cpu平均负载低于该值,才能执行任务
# 默认值为-1,则最大cpu平均负载=系统cpu核数 * 2
worker.max.cpuload.avg=-1

# worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
# worker的预留内存,只有当系统可用内存大于等于该值,才能执行任务,单位为GB
# 默认0.3G
worker.reserved.memory=0.3

# default worker groups separated by comma, like 'worker.groups=default,test'
# 工作组名称,多个用,隔开
worker.groups=default

WorkerServer启动

public void run() {
    // init remoting server
    NettyServerConfig serverConfig = new NettyServerConfig();
    serverConfig.setListenPort(workerConfig.getListenPort());
    this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_REQUEST, new TaskExecuteProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_REQUEST, new TaskKillProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.DB_TASK_ACK, new DBTaskAckProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.DB_TASK_RESPONSE, new DBTaskResponseProcessor());
    this.nettyRemotingServer.start();

    // worker registry
    try {
        this.workerRegistry.registry();
        this.workerRegistry.getZookeeperRegistryCenter().setStoppable(this);
        Set<String> workerZkPaths = this.workerRegistry.getWorkerZkPaths();
        this.workerRegistry.getZookeeperRegistryCenter().getRegisterOperator().handleDeadServer(workerZkPaths, ZKNodeType.WORKER, Constants.DELETE_ZK_OP);
    } catch (Exception e) {
        logger.error(e.getMessage(), e);
        throw new RuntimeException(e);
    }

    // retry report task status
    this.retryReportTaskStatusThread.start();

    /**
     * register hooks, which are called before the process exits
     */
    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        if (Stopper.isRunning()) {
            close("shutdownHook");
        }
    }));
}
注册四个Command:
  1. TASK_EXECUTE_REQUEST:task执行请求
  2. TASK_KILL_REQUEST:task停止请求
  3. DB_TASK_ACK:Worker接受到Master的调度请求,回应master
  4. DB_TASK_RESPONSE:
  • 注册WorkerServer到Zookeeper,并发送心跳
  • 报告Task执行状态

RetryReportTaskStatusThread

这是一个兜底机制,主要负责定时轮询向Master汇报任务的状态,直到Master回复状态的ACK,避免任务状态丢失;

每隔5分钟,检查一下responceCache中的ACK Cache和Response Cache是否为空,如果不为空则向Master发送ack_commandresponse command请求。

public void run() {
    ResponceCache responceCache = ResponceCache.get();

    while (Stopper.isRunning()){

        // sleep 5 minutes
        ThreadUtils.sleep(RETRY_REPORT_TASK_STATUS_INTERVAL);

        try {
            if (!responceCache.getAckCache().isEmpty()){
                Map<Integer,Command> ackCache =  responceCache.getAckCache();
                for (Map.Entry<Integer, Command> entry : ackCache.entrySet()){
                    Integer taskInstanceId = entry.getKey();
                    Command ackCommand = entry.getValue();
                    taskCallbackService.sendAck(taskInstanceId,ackCommand);
                }
            }

            if (!responceCache.getResponseCache().isEmpty()){
                Map<Integer,Command> responseCache =  responceCache.getResponseCache();
                for (Map.Entry<Integer, Command> entry : responseCache.entrySet()){
                    Integer taskInstanceId = entry.getKey();
                    Command responseCommand = entry.getValue();
                    taskCallbackService.sendResult(taskInstanceId,responseCommand);
                }
            }
        }catch (Exception e){
            logger.warn("retry report task status error", e);
        }
    }
}

Master与Worker的交互设计

Apache DolphinScheduler Master和Worker模块是两个独立的JVM进程,可以部署在不同的服务器上,Master与Worker的通信都是通过Netty实现RPC交互的,一共用到7种处理器。

模块 处理器 作用
master masterTaskResponseProcessor 处理TaskExecuteResponseCommand消息,将消息添加到TaskResponseService的任务响应队列中
master masterTaskAckProcessor 处理TaskExecuteAckCommand消息,将消息添加到TaskResponseService的任务响应队列中
master masterTaskKillResponseProcessor 处理TaskKillResponseCommand消息,并在日志中打印消息内容
worker workerTaskExecuteProcessor 处理TaskExecuteRequestCommand消息,并发送TaskExecuteAckCommand到master,提交任务执行
worker workerTaskKillProcessor 处理TaskKillRequestCommand消息,调用kill -9 pid杀死任务对应的进程,并向master发送TaskKillResponseCommand消息
worker workerDBTaskAckProcessor 处理DBTaskAckCommand消息,针对执行成功的任务,从ResponseCache中删除
worker workerDBTaskResponseProcessor 处理DBTaskResponseCommand消息,针对执行成功的任务,从ResponseCache中删除

分发任务如何交互

master#TaskPriorityQueueConsumer

Master任务里有一个TaskPriorityQueueConsumer,会从TaskPriorityQueue里每次取3个Task分发给Worker执行,这里会创建TaskExecuteRequestCommand

TaskPriorityQueueConsumer#run()


                
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

DolphinScheduler社区

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值