Flink源码篇 No.8-任务提交之启动TaskManager(per-job on yarn)

第1章 注册回调

在上一篇文章中,启动nodeManagerClient的时候,注册了一个CallbackHandler回调yarnContainerEventHandler

org.apache.flink.yarn.YarnResourceManagerDriver#initializeInternal

@Override
protected void initializeInternal() throws Exception {
	final YarnContainerEventHandler yarnContainerEventHandler = new YarnContainerEventHandler();
	try {
		// TODO 创建并启动Yarn的resourceManager客户端,用于向Yarn申请资源
		resourceManagerClient = yarnResourceManagerClientFactory.createResourceManagerClient(
			yarnHeartbeatIntervalMillis,
			yarnContainerEventHandler);
		resourceManagerClient.init(yarnConfig);
		resourceManagerClient.start();

		// TODO 注册AppMaster
		final RegisterApplicationMasterResponse registerApplicationMasterResponse = registerApplicationMaster();
		getContainersFromPreviousAttempts(registerApplicationMasterResponse);
		taskExecutorProcessSpecContainerResourcePriorityAdapter =
			new TaskExecutorProcessSpecContainerResourcePriorityAdapter(
				registerApplicationMasterResponse.getMaximumResourceCapability(),
				ExternalResourceUtils.getExternalResources(flinkConfig, YarnConfigOptions.EXTERNAL_RESOURCE_YARN_CONFIG_KEY_SUFFIX));
	} catch (Exception e) {
		throw new ResourceManagerException("Could not start resource manager client.", e);
	}

	// TODO 创建并启动Yarn的nodeManager课户端,用于启动taskManager
	nodeManagerClient = yarnNodeManagerClientFactory.createNodeManagerClient(yarnContainerEventHandler);
	nodeManagerClient.init(yarnConfig);
	nodeManagerClient.start();
}

第2章 注册TaskManager启动类

在yarnContainerEventHandler中,有一个容器分配的回调方法

org.apache.flink.yarn.YarnResourceManagerDriver.YarnContainerEventHandler#onContainersAllocated

@Override
public void onContainersAllocated(List<Container> containers) {
	runAsyncWithFatalHandler(() -> {
		checkInitialized();
		log.info("Received {} containers.", containers.size());

		for (Map.Entry<Priority, List<Container>> entry : groupContainerByPriority(containers).entrySet()) {
			// TODO 按优先级分配容器
			onContainersOfPriorityAllocated(entry.getKey(), entry.getValue());
		}

		// if we are waiting for no further containers, we can go to the
		// regular heartbeat interval
		if (getNumRequestedNotAllocatedWorkers() <= 0) {
			resourceManagerClient.setHeartbeatInterval(yarnHeartbeatIntervalMillis);
		}
	});
}

org.apache.flink.yarn.YarnResourceManagerDriver#onContainersOfPriorityAllocated

private void onContainersOfPriorityAllocated(Priority priority, List<Container> containers) {
	//...
	
	int numAccepted = 0;
	while (containerIterator.hasNext() && pendingContainerRequestIterator.hasNext()) {
		final Container container = containerIterator.next();
		final AMRMClient.ContainerRequest pendingRequest = pendingContainerRequestIterator.next();
		final ResourceID resourceId = getContainerResourceId(container);

		final CompletableFuture<YarnWorkerNode> requestResourceFuture = pendingRequestResourceFutures.poll();
		Preconditions.checkState(requestResourceFuture != null);

		if (pendingRequestResourceFutures.isEmpty()) {
			requestResourceFutures.remove(taskExecutorProcessSpec);
		}

		// TODO 在容器中启动TaskExecutor,异步启动
		startTaskExecutorInContainerAsync(container, taskExecutorProcessSpec, resourceId, requestResourceFuture);
		removeContainerRequest(pendingRequest);

		numAccepted++;
	}

	// ...
}

org.apache.flink.yarn.YarnResourceManagerDriver#startTaskExecutorInContainerAsync

private void startTaskExecutorInContainerAsync(
		Container container,
		TaskExecutorProcessSpec taskExecutorProcessSpec,
		ResourceID resourceId,
		CompletableFuture<YarnWorkerNode> requestResourceFuture) {
	// TODO 创建TaskExecutor
	final CompletableFuture<ContainerLaunchContext> containerLaunchContextFuture =
		FutureUtils.supplyAsync(() -> createTaskExecutorLaunchContext(
			resourceId, container.getNodeId().getHost(), taskExecutorProcessSpec), getIoExecutor());

	FutureUtils.assertNoException(
		containerLaunchContextFuture.handleAsync((context, exception) -> {
			if (exception == null) {
				nodeManagerClient.startContainerAsync(container, context);
				requestResourceFuture.complete(new YarnWorkerNode(container, resourceId));
			} else {
				requestResourceFuture.completeExceptionally(exception);
			}
			return null;
		}, getMainThreadExecutor()));
}

org.apache.flink.yarn.YarnResourceManagerDriver#createTaskExecutorLaunchContext

private ContainerLaunchContext createTaskExecutorLaunchContext(
	ResourceID containerId,
	String host,
	TaskExecutorProcessSpec taskExecutorProcessSpec) throws Exception {

	// init the ContainerLaunchContext
	final String currDir = configuration.getCurrentDir();

	final ContaineredTaskManagerParameters taskManagerParameters =
		ContaineredTaskManagerParameters.create(flinkConfig, taskExecutorProcessSpec);

	log.info("TaskExecutor {} will be started on {} with {}.",
		containerId.getStringWithMetadata(),
		host,
		taskExecutorProcessSpec);

	final Configuration taskManagerConfig = BootstrapTools.cloneConfiguration(flinkConfig);
	taskManagerConfig.set(TaskManagerOptions.TASK_MANAGER_RESOURCE_ID, containerId.getResourceIdString());
	taskManagerConfig.set(TaskManagerOptionsInternal.TASK_MANAGER_RESOURCE_ID_METADATA, containerId.getMetadata());

	final String taskManagerDynamicProperties =
		BootstrapTools.getDynamicPropertiesAsString(flinkClientConfig, taskManagerConfig);

	log.debug("TaskManager configuration: {}", taskManagerConfig);

	// TODO 创建TaskExecutor
	// TODO 注册taskManagerMainClass,容器内调用这个YarnTaskExecutorRunner.class的main方法来启动Flink的Taskmanager
	final ContainerLaunchContext taskExecutorLaunchContext = Utils.createTaskExecutorContext(
		flinkConfig,
		yarnConfig,
		configuration,
		taskManagerParameters,
		taskManagerDynamicProperties,
		currDir,
		YarnTaskExecutorRunner.class,
		log);

	taskExecutorLaunchContext.getEnvironment()
		.put(ENV_FLINK_NODE_ID, host);
	return taskExecutorLaunchContext;
}

当TaskExecutor启动时,就会调用YarnTaskExecutorRunner的main方法来启动TaskManager

第3章 启动TaskManager

 YarnTaskExecutorRunner就是Yarn模式下的TaskManager启动类,我们接着往下看:

org.apache.flink.yarn.YarnTaskExecutorRunner#main

/**
 * The entry point for the YARN task executor runner.
 *
 * @param args The command line arguments.
 */
public static void main(String[] args) {
	EnvironmentInformation.logEnvironmentInfo(LOG, "YARN TaskExecutor runner", args);
	SignalHandler.register(LOG);
	JvmShutdownSafeguard.installAsShutdownHook(LOG);

	// TODO 运行TaskManager
	runTaskManagerSecurely(args);
}

org.apache.flink.yarn.YarnTaskExecutorRunner#runTaskManagerSecurely

private static void runTaskManagerSecurely(String[] args) {
	try {
		LOG.debug("All environment variables: {}", ENV);

		// 获得环境当前路径
		final String currDir = ENV.get(Environment.PWD.key());
		LOG.info("Current working Directory: {}", currDir);

		// 加载配置
		final Configuration configuration = TaskManagerRunner.loadConfiguration(args);
		setupAndModifyConfiguration(configuration, currDir, ENV);

		// TODO 运行TaskManager
		TaskManagerRunner.runTaskManagerSecurely(configuration);
	}
	catch (Throwable t) {
		final Throwable strippedThrowable = ExceptionUtils.stripException(t, UndeclaredThrowableException.class);
		// make sure that everything whatever ends up in the log
		LOG.error("YARN TaskManager initialization failed.", strippedThrowable);
		System.exit(INIT_ERROR_EXIT_CODE);
	}
}

org.apache.flink.runtime.taskexecutor.TaskManagerRunner#runTaskManagerSecurely(org.apache.flink.configuration.Configuration)

public static void runTaskManagerSecurely(Configuration configuration) throws Exception {
	replaceGracefulExitWithHaltIfConfigured(configuration);
	final PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
	// 文件系统初始化
	FileSystem.initialize(configuration, pluginManager);

	// 安装配置信息
	SecurityUtils.install(new SecurityConfiguration(configuration));

	// TODO 运行TaskManager
	SecurityUtils.getInstalledContext().runSecured(() -> {
		runTaskManager(configuration, pluginManager);
		return null;
	});
}

org.apache.flink.runtime.taskexecutor.TaskManagerRunner#runTaskManager

public static void runTaskManager(Configuration configuration, PluginManager pluginManager) throws Exception {
	final TaskManagerRunner taskManagerRunner = new TaskManagerRunner(configuration, pluginManager, TaskManagerRunner::createTaskExecutorService);

	// TODO 启动TaskManager
	taskManagerRunner.start();
}
org.apache.flink.runtime.taskexecutor.TaskManagerRunner#start
public void start() throws Exception {
	taskExecutorService.start();
}

org.apache.flink.runtime.taskexecutor.TaskManagerRunner.TaskExecutorService#start

实现类是:org.apache.flink.runtime.taskexecutor.TaskExecutorToServiceAdapter

org.apache.flink.runtime.taskexecutor.TaskExecutorToServiceAdapter#start

@Override
public void start() {
	// TODO RPC的方式启动TaskExecutor => TaskExecutor的onStart方法
	taskExecutor.start();
}

org.apache.flink.runtime.taskexecutor.TaskExecutor#onStart

@Override
public void onStart() throws Exception {
	try {
		// TODO 启动TaskExecutor服务
		startTaskExecutorServices();
	} catch (Throwable t) {
		final TaskManagerException exception = new TaskManagerException(String.format("Could not start the TaskExecutor %s", getAddress()), t);
		onFatalError(exception);
		throw exception;
	}

	startRegistrationTimeout();
}

到这里,TaskManager就被Yarn启动起来,而TaskManager启动后,其内部又做了什么呢?下篇文章咱们继续。

### 配置 `flink-conf.yaml` 文件以支持 Apache Flink on YARN #### 理解配置环境 为了使Apache Flink能够在YARN环境中正常运行,需要对`flink-conf.yaml`文件进行特定设置。此文件位于Flink安装目录下的`conf`子目录中[^1]。 #### 设置必要的属性 对于YARN集成而言,有几个重要的配置项应当被指定: - **yarn.application.classpath**: 定义应用程序类路径字符串,该字符串会被传递给YARN容器。这通常用于确保所有依赖库都能被正确加载。 - **fs.hdfs.hadoopconf**: 指向Hadoop配置文件所在的本地文件系统的绝对路径。这对于访问分布式文件系统至关重要。 - **jobmanager.memory.process.size** 和 **taskmanager.memory.process.size**: 分别设定JobManager和TaskManager进程所需的总内存量。合理调整这些值可以优化性能并防止内存溢出错误。 另外,考虑到临时数据存储的需求,建议也适当修改`io.tmp.dirs`参数来指明多个可用作临时空间的位置,以便于轮转使用不同磁盘上的位置,默认情况下会采用Java虚拟机所指向的系统级临时文件夹[^2]。 ```yaml # Example configuration entries for running Flink on YARN yarn.application-classpath: /etc/hadoop/conf,/opt/flink/lib/*:/opt/flink/plugins/* fs.hdfs.hadoopconf: /etc/hadoop/conf/ jobmanager.memory.process.size: 2048m taskmanager.memory.process.size: 4096m io.tmp.dirs: /mnt/disk1/tmp,/mnt/disk2/tmp ``` 通过上述方式编辑`flink-conf.yaml`中的条目,能够有效地准备一个适合部署到YARN集群上的Flink实例。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

pezynd

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值