Flink源码-Execution的生成

最新推荐文章于 2024-09-18 14:27:41 发布

原创

最新推荐文章于 2024-09-18 14:27:41 发布 · 1k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#flink #大数据

文章详细描述了FlinkJobMaster启动过程，包括JobGraph到ExecutionGraph的转换、checkpoint协调器的创建，以及job执行的启动、服务初始化、调度器重置和任务部署等关键步骤。

上一节我们分析到了在jobmaster启动后，会将JobGraph转换成ExecutionGraph,同时也会将checkpoint相关配置传给executionGraph,并且还创建了checkpointCoordinator。下面我们接着上节的地方继续往下分析。

1.start

jobmaster启动后在选主完成后会调用它的start方法，start方法中会调用startJobExecution方法，开始job的执行。

2.startJobExecution

然后再startJobExecution方法中又调用了其自身的两个方法：

startJobMasterService：在这里才是真正启动jobMaster的服务

resetAndScheduler：重置和启动调度器

private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {

		validateRunsInMainThread();

		checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");

		if (Objects.equals(getFencingToken(), newJobMasterId)) {
			log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);

			return Acknowledge.get();
		}

		setNewFencingToken(newJobMasterId);

		/*TODO 真正启动JobMaster服务*/
		startJobMasterServices();

		log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId);

		/*TODO 重置和启动调度器*/
		resetAndStartScheduler();

		return Acknowledge.get();
	}

3.startJobMasterService

在这个方法里会做以下几件事：

1.开启和TaskManager和ResourceManager的心跳服务

2.启动slotPool,这个slotPool是jobmaster这边负责管理slot资源的，里面保存了该job持有的资源，如果slot资源不够时，slotPool会向ResourceManager去申请slot，这个ResourceManager是flink集群自身的ResourceManager，而非yarn的ResourceManager，当ResourceManager收到slotPool发过来的slot申请时，会去向TaskManager申请空闲的slot

3.建立与resourceManager的连接，这个resoureManager就是第二点中讲到的resourceManager

private void startJobMasterServices() throws Exception {
		/*TODO 启动心跳服务：taskmanager、resourcemanager*/
		startHeartbeatServices();

		// start the slot pool make sure the slot pool now accepts messages for this leader
		/*TODO 启动 slotpool*/
		slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());

		//TODO: Remove once the ZooKeeperLeaderRetrieval returns the