Flink on Yarn 启动方式

flink yarn 核心入口方法

flink yarn集群模式运行的job,通过flink cli提交任务,对应的类为org.apache.flink.yarn.cli.FlinkYarnSessionCli。在FlinkYarnSessionCli对象内会创建org.apache.flink.yarn.YarnClusterDescriptor对象,此对象封装了创建flink yarn session逻辑。

org.apache.flink.yarn.YarnClusterDescriptor#deploySessionCluster方法包含创建/获取flink yarn集群,以及启动task manager的流程。所以最需要关注的就是YarnClusterDescriptor deploySessionCluster
方法。


创建 ApplicationMaster 提交任务

deploySessionCluster代码如下,getYarnSessionClusterEntrypoint方法指定了flink yarn集群启动入口类YarnSessionClusterEntrypointApplicationMaster在启动申请到container资源时,会启动调用入口类main方法,在这里也就是启动flink
集群。(具体可以继续往下看)

@Override
public ClusterClientProvider<ApplicationId> deploySessionCluster(ClusterSpecification clusterSpecification) throws ClusterDeploymentException {
	try {
		return deployInternal(
				clusterSpecification,
				"Flink session cluster",
				getYarnSessionClusterEntrypoint(),
				null,
				false);
	} catch (Exception e) {
		throw new ClusterDeploymentException("Couldn't deploy Yarn session cluster", e);
	}
}

protected String getYarnSessionClusterEntrypoint() {
	return YarnSessionClusterEntrypoint.class.getName();
}

deployInternal方法会创建ApplicationMasterJobManager。下面的代码里通过yarn client创建YarnClientApplication对象,校验yarn资源是否能够满足当前需求,最后通过startAppMaster启动ApplicationMaster,startAppMaster配置ApplicationMaster启动时的context,说简单点就是告诉它你要怎么启动,启动的时候运行什么东西。

private ClusterClientProvider<ApplicationId> deployInternal(
		ClusterSpecification clusterSpecification,
		String applicationName,
		String yarnClusterEntrypoint,
		@Nullable JobGraph jobGraph,
		boolean detached) throws Exception {
    ...
    
	isReadyForDeployment(clusterSpecification);

	// ------------------ Check if the specified queue exists --------------------

	checkYarnQueues(yarnClient);

	// ------------------ Check if the YARN ClusterClient has the requested resources --------------

	// Create application via yarnClient
	final YarnClientApplication yarnApplication = yarnClient.createApplication();
	final GetNewApplicationResponse appResponse = yarnApplication.getNewApplicationResponse();

    ...
    
	final ClusterSpecification validClusterSpecification;
	try {
		validClusterSpecification = validateClusterResources(
				clusterSpecification,
				yarnMinAllocationMB,
				maxRes,
				freeClusterMem);
	} catch (YarnDeploymentException yde) {
		failSessionDuringDeployment(yarnClient, yarnApplication);
		throw yde;
	}

	...

	flinkConfiguration.setString(ClusterEntrypoint.EXECUTION_MODE, executionMode.toString());

	ApplicationReport report = startAppMaster(
			flinkConfiguration,
			applicationName,
			yarnClusterEntrypoint,
			jobGraph,
			yarnClient,
			yarnApplication,
			validClusterSpecification);
	...
}

startAppMaster方法特别长,还是不帖代码了,主要流程

  1. 构建hdfs FileSystem client
  2. flink jobjar文件以及相关的依赖文件上传到hdfs,记录文件位置。
  3. 记录启动flink job需要的所有classpath,最后会设置ApplicationMaster环境变量内,key_FLINK_CLASSPATH
  4. setupApplicationMasterContainer方法设置ApplicationMaster Container,这一步在配置container启动命令。这一步会用到本文上面讲到的getYarnSessionClusterEntrypoint方法获取的Application Container启动入口类`YarnSessionClusterEntrypoint。指定了启动入口类时的启动参数等配置。
  5. 完善Application Context内容,将第4步的配置写入context,以及相关的环境变量,yarn queue等。
  6. 最后通过Yarn client提交Application Context,死循环等待ApplicationMaster启动完成。

任务被成功提交启动后,通过ApplicationMaster的各种回调启动flink任务,具体看下一节,


运行 flink job

Yarn Application各个阶段,都会有对应的回调通知。
flinl通过cliyarn提交任务后,入口类YarnSessionClusterEntrypoint启动最终落到YarnResourceManager对象,这个类实现了org.apache.hadoop.yarn.client.api.async.AMRMClientAsync.CallbackHandler接口,能够在Yarn Application Master创建的各个阶段,管理flink on yarn任务。例如在yarn container被分配完成的时候,会启动运行 TaskExecutor

@Override
public void onContainersAllocated(List<Container> containers) {
	runAsync(() -> {
		...
		final List<Container> requiredContainers = containers.subList(0, numAcceptedContainers);
		...
		// 在 container 运行 TaskExecutor
		requiredContainers.forEach(this::startTaskExecutorInContainer);

		// if we are waiting for no further containers, we can go to the
		// regular heartbeat interval
		if (numPendingContainerRequests <= 0) {
			resourceManagerClient.setHeartbeatInterval(yarnHeartbeatIntervalMillis);
		}
	});
}

startTaskExecutorInContainer方法内createTaskExecutorLaunchContext初始化yarn container运行的context,这个方法内容比较多,简略的说就是构建了ContainerLaunchContext对象,该对象用来描述启动containercontext。具体可以自己阅读org.apache.flink.yarn.Utils#createTaskExecutorContext方法。这个最重要的就是将YarnTaskExecutorRunner.class作为container启动类。

最后启动container, task manager的启动逻辑都被封装在YarnTaskExecutorRunner.class类,这个已经涉及到task manager启动处理过程,不在本篇展开。

private void startTaskExecutorInContainer(Container container) {
	final String containerIdStr = container.getId().toString();
	final ResourceID resourceId = new ResourceID(containerIdStr);

	workerNodeMap.put(resourceId, new YarnWorkerNode(container));

	try {
		// Context information used to start a TaskExecutor Java process
		// 构建 task manager 运行的 context
		ContainerLaunchContext taskExecutorLaunchContext = createTaskExecutorLaunchContext(
			containerIdStr,
			container.getNodeId().getHost());

        // 启动 container
		nodeManagerClient.startContainerAsync(container, taskExecutorLaunchContext);
	} catch (Throwable t) {
		releaseFailedContainerAndRequestNewContainerIfRequired(container.getId(), t);
	}
}

private ContainerLaunchContext createTaskExecutorLaunchContext(
	String containerId,
	String host) throws Exception {
	...

    // 构建 container context
	ContainerLaunchContext taskExecutorLaunchContext = Utils.createTaskExecutorContext(
		flinkConfig,
		yarnConfig,
		env,
		taskManagerParameters,
		taskManagerDynamicProperties,
		currDir,
		YarnTaskExecutorRunner.class,
		log);

	// set a special environment variable to uniquely identify this container
	taskExecutorLaunchContext.getEnvironment()
			.put(ENV_FLINK_CONTAINER_ID, containerId);
	taskExecutorLaunchContext.getEnvironment()
			.put(ENV_FLINK_NODE_ID, host);
	return taskExecutorLaunchContext;
}

至此flink on yarn提交任务启动方式基本已经搞清楚,主要通过在flink yarn集群初始化各个阶段回调构建不同的对象,完成任务到yarn的提交。任务本身的jar会被放在hdfsjob运行时一定是通过自定义classloaderhdfs加载类,完成任务的运行。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值