上一节我们分析到了在jobmaster启动后,会将JobGraph转换成ExecutionGraph,同时也会将checkpoint相关配置传给executionGraph,并且还创建了checkpointCoordinator。下面我们接着上节的地方继续往下分析。
1.start
jobmaster启动后在选主完成后会调用它的start方法,start方法中会调用startJobExecution方法,开始job的执行。

2.startJobExecution
然后再startJobExecution方法中又调用了其自身的两个方法 :
startJobMasterService:在这里才是真正启动jobMaster的服务
resetAndScheduler: 重置和启动调度器
private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
validateRunsInMainThread();
checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");
if (Objects.equals(getFencingToken(), newJobMasterId)) {
log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);
return Acknowledge.get();
}
setNewFencingToken(newJobMasterId);
/*TODO 真正启动JobMaster服务*/
startJobMasterServices();
log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId);
/*TODO 重置和启动调度器*/
resetAndStartScheduler();
return Acknowledge.get();
}
3.startJobMasterService
在这个方法里会做以下几件事:
1.开启和TaskManager和ResourceManager的心跳服务
2.启动slotPool,这个slotPool是jobmaster这边负责管理slot资源的,里面保存了该job持有的资源,如果slot资源不够时,slotPool会向ResourceManager去申请slot,这个ResourceManager是flink集群自身的ResourceManager,而非yarn的ResourceManager,当ResourceManager收到slotPool发过来的slot申请时,会去向TaskManager申请空闲的slot
3.建立与resourceManager的连接,这个resoureManager就是第二点中讲到的resourceManager
private void startJobMasterServices() throws Exception {
/*TODO 启动心跳服务:taskmanager、resourcemanager*/
startHeartbeatServices();
// start the slot pool make sure the slot pool now accepts messages for this leader
/*TODO 启动 slotpool*/
slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());
//TODO: Remove once the ZooKeeperLeaderRetrieval returns the

文章详细描述了FlinkJobMaster启动过程,包括JobGraph到ExecutionGraph的转换、checkpoint协调器的创建,以及job执行的启动、服务初始化、调度器重置和任务部署等关键步骤。
最低0.47元/天 解锁文章
1254

被折叠的 条评论
为什么被折叠?



