YARN overview

YARN将资源管理和作业调度划分为全局的ResourceManager和每个应用程序的ApplicationMaster。ResourceManager包含Scheduler和ApplicationsManager,负责集群资源分配和应用管理。NodeManager监控节点资源,Health Checker Service确保节点健康。YARN支持资源预留、联盟以及NodeManager重启功能,提供高可用性和可扩展性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.

YARN的基本思想是将资源管理和作业调度/监视的功能划分为单独的守护进程。其想法是拥有一个全局资源管理器(RM)和每个应用程序应用程序管理器(AM)。应用程序要么是单个作业,要么是DAG作业

image在这里插入图片描述

The ResourceManager and the NodeManager form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.

ResourceManager和NodeManager构成数据计算框架。ResourceManager是在系统中的所有应用程序之间仲裁资源的最终权威。NodeManager是每台机器框架的代理,它负责监控容器的资源使用(cpu、内存、磁盘、网络)并将其报告给ResourceManager/Scheduler。

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

每个应用程序的ApplicationMaster实际上是一个框架特定的库,任务是跟ResourceManager协商资源并与NodeManager(S)一起执行和监视任务。

The ResourceManager has two main components: Scheduler and ApplicationsManager.

ResourceManager(资源管理器)有两个主要组件:调度器(Scheduler)和应用程序管理器(ApplicationsManager)。

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based on the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.

调度器(Scheduler)负责将资源分配给各种运行的应用程序,以通晓这些应用程序受到的容量、队列等的约束。调度器(Scheduler)是纯粹的调度器,因为它不执行对应用程序的状态监视或跟踪。此外,它不对由于应用程序失败或硬件故障导致的任务失败提供重新启动的保证。调度器(Scheduler)基于应用程序的资源需求来执行调度功能,它是基于资源容器的抽象概念来实现的,该资源容器包括内存、CPU、磁盘、网络等元素。

The Scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current schedulers such as the CapacityScheduler and the FairScheduler would be some examples of plug-ins.

Scheduler有一个可插入的策略,负责将集群资源划分到各种队列、应用程序等中。当前的调度器,比如CapacityScheduler和FairScheduler,就是一些插件的例子

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

ApplicationsManager负责接受作业提交,协商第一个容器以执行特定于应用程序的ApplicationMaster,并在失败时提供重新启动ApplicationMaster容器的服务。每个应用程序ApplicationMaster负责从调度器(Scheduler)协商适当的资源容器,跟踪它们的状态并监视进度。

MapReduce in hadoop-2.x maintains API compatibility with previous stable release (hadoop-1.x). This means that all MapReduce jobs should still run unchanged on top of YARN with just a recompile.

HADOP-2 .X中的MapReduce与以前的稳定释放(HADOPO-1,X)保持了API相容性。这意味着所有在YARN基础之上的MapReduce作业都不变,只需要重新编译。

YARN supports the notion of resource reservation via the ReservationSystem, a component that allows users to specify a profile of resources over-time and temporal constraints (e.g., deadlines), and reserve resources to ensure the predictable execution of important jobs.The ReservationSystem tracks resources over-time, performs admission control for reservations, and dynamically instruct the underlying scheduler to ensure that the reservation is fullfilled.

YARN通过预定系统(ReservationSystem)支持资源预留的概念,该组件允许用户随时间的变化指定资源和临时约束(例如,截止日期),并预留资源以确保重要作业的可预测执行。通过预定系统监控资源随着时间的变化,对预订执行接纳控制,并动态指示底层调度器确保预订得以实现。

In order to scale YARN beyond few thousands nodes, YARN supports the notion of Federation via the YARN Federation feature. Federation allows to transparently wire together multiple yarn (sub-)clusters, and make them appear as a single massive cluster. This can be used to achieve larger scale, and/or to allow multiple independent clusters to be used together for very large jobs, or for tenants who have capacity across all of them.

为了使YARN超过几千个节点,YARN支持联盟(Federation)的概念。联盟允将多个YARN节点链接成一个子集群,并使它们可以像单个大集群一样许透明地写。这可以用于实现更大的规模,和/或允许多个独立的集群一起用于非常大的作业,或用于具有跨所有作业的容量的占有者。

NodeManager

The NodeManager is responsible for launching and managing containers on a node. Containers execute tasks as specified by the AppMaster.

NodeManager负责在一个节点上启动和管理容器。容器执行由AppMaster指定的任务。

Health Checker Service

The NodeManager runs services to determine the health of the node it is executing on. The services perform checks on the disk as well as any user specified tests. If any health check fails, the NodeManager marks the node as unhealthy and communicates this to the ResourceManager, which then stops assigning containers to the node. Communication of the node status is done as part of the heartbeat between the NodeManager and the ResourceManager. The intervals at which the disk checker and health monitor(described below) run don’t affect the heartbeat intervals. When the heartbeat takes place, the status of both checks is used to determine the health of the node.

NodeManager运行服务以确定其正在执行的节点的健康状况。服务执行磁盘上的检查以及任何用户指定的测试。如果任何检查(发现异常)失败,则NodeManager将节点标记为不健康状态,并将其传递给ResourceManager,然后ResourceManager停止向节点分配容器。节点状态的通信是作为NodeManager和ResourceManager之间的心跳的一部分来完成的。磁盘检查器和健康监视器(以下描述)运行的间隔不会影响心跳间隔。当心跳发生时,两种检查的状态被用来确定节点的健康状况。

Disk Checker

The disk checker checks the state of the disks that the NodeManager is configured to use(local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively). The checks include permissions and free disk space. It also checks that the filesystem isn’t in a read-only state. The checks are run at 2 minute intervals by default but can be configured to run as often as the user desires. If a disk fails the check, the NodeManager stops using that particular disk but still reports the node status as healthy. However if a number of disks fail the check(the number can be configured, as explained below), then the node is reported as unhealthy to the ResourceManager and new containers will not be assigned to the node.

磁盘检查器检查NodeManager被配置为使用的磁盘的状态(local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively)。检查包括权限和空闲磁盘空间。它还检查文件系统是否处于只读状态。默认情况下,检查以2分钟的间隔运行,但可以按照用户希望的那样配置运行。如果磁盘检查失败,NodeManager停止使用该特定磁盘,但仍然报告节点状态为健康状态。但是,如果某些磁盘检查失败(可以配置数字,如下面所解释的),则该节点被报告为不健康(向ResourceManager报告),并且新的容器将不被分配给节点。

The following configuration parameters can be used to modify the disk checks:

下列配置参数可用于修改磁盘检查:

Configuration NameAllowed ValuesDescription
yarn.nodemanager.disk-health-checker.enabletrue, falseEnable or disable the disk health checker service
yarn.nodemanager.disk-health-checker.interval-msPositive integerThe interval, in milliseconds, at which the disk checker should run; the default value is 2 minutes
yarn.nodemanager.disk-health-checker.min-healthy-disksFloat between 0-1The minimum fraction of disks that must pass the check for the NodeManager to mark the node as healthy; the default is 0.25
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentageFloat between 0-100The maximum percentage of disk space that may be utilized before a disk is marked as unhealthy by the disk checker service. This check is run for every disk used by the NodeManager. The default value is 90 i.e. 90% of the disk can be used.
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mbIntegerThe minimum amount of free space that must be available on the disk for the disk checker service to mark the disk as healthy. This check is run for every disk used by the NodeManager. The default value is 0 i.e. the entire disk can be used.
External Health Script

Users may specify their own health checker script that will be invoked by the health checker service. Users may specify a timeout as well as options to be passed to the script. If the script exits with a non-zero exit code, times out or results in an exception being thrown, the node is marked as unhealthy. Please note that if the script cannot be executed due to permissions or an incorrect path, etc, then it counts as a failure and the node will be reported as unhealthy. Please note that speifying a health check script is not mandatory. If no script is specified, only the disk checker status will be used to determine the health of the node.

用户可以指定自己的健康检查器脚本,该脚本将由健康检查服务调用。用户可以指定超时以及将要传递给脚本的选项。如果脚本退出时具有非零退出代码、超时或导致引发异常,则该节点被标记为不健康。请注意,如果由于权限或错误路径等原因无法执行脚本,则该脚本将视为失败,节点将报告为不健康的。请注意,发布健康检查脚本不是强制性的。如果未指定脚本,则只使用磁盘检查程序状态来确定节点的健康状况。

The following configuration parameters can be used to set the health script:

Configuration NameAllowed ValuesDescription
yarn.nodemanager.health-checker.interval-msPostive integerThe interval, in milliseconds, at which health checker service runs; the default value is 10 minutes.
yarn.nodemanager.health-checker.script.timeout-msPostive integerThe timeout for the health script that’s executed; the default value is 20 minutes.
yarn.nodemanager.health-checker.script.pathStringAbsolute path to the health check script to be run.
yarn.nodemanager.health-checker.script.optsStringArguments to be passed to the script when the script is executed.
NodeManager Restart

This document gives an overview of NodeManager (NM) restart, a feature that enables the NodeManager to be restarted without losing the active containers running on the node. At a high level, the NM stores any necessary state to a local state-store as it processes container-management requests. When the NM restarts, it recovers by first loading state for various subsystems and then letting those subsystems perform recovery using the loaded state.

本文档给出了NodeManager(NM)重启的概述,该特性使NodeManager能够在不丢失节点上运行的活动容器的情况下重启。在高级别上,NM在处理容器管理请求时将任何必要的状态存储到本地状态存储。当NM重新启动时,它通过首先为各个子系统加载状态,然后让这些子系统使用加载状态执行恢复来恢复。

Enabling NM Restart
  1. To enable NM Restart functionality, set the following property in conf/yarn-site.xml to true.
PropertyValue
yarn.nodemanager.recovery.enabledtrue, (default value is set to false)
  1. Configure a path to the local file-system directory where the NodeManager can save its run state.
PropertyDescription
yarn.nodemanager.recovery.dirThe local filesystem directory in which the node manager will store state when recovery is enabled. The default value is set to $hadoop.tmp.dir/yarn-nm-recovery.
  1. Configure a valid RPC address for the NodeManager.
    yarn.nodemanager.address : Ephemeral ports (port 0, which is default) cannot be used for the NodeManager’s RPC server specified via yarn.nodemanager.address as it can make NM use different ports before and after a restart. This will break any previously running clients that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling NM restart.

临时端口(缺省端口0)不能用于通过yarn.nodemanager.address指定的NodeManager的RPC服务器,因为它可以使NM在重启之前和之后使用不同的端口。这将在重新启动之前中断与NM通信的任何先前运行的客户端。明确地将yarn.nodemanager.address设置为具有特定端口号的地址(例如,0.0.0.0:45454)是启用NM重新启动的先决条件。

  1. Auxiliary services.

NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitializes the auxiliary service.

在YARN集群中的NodeManagers可以配置为运行辅助服务。对于完全功能的NM重启,YARN依赖于配置的任何辅助服务同样支持恢复。这通常包括(1)避免使用临时端口,以便先前运行的客户端(在这种情况下,通常是容器)在重新启动后不会中断;(2)当NodeManager重新启动和重新启动时,辅助服务本身通过重新加载任何先前状态来支持可恢复性,并且重新初始化辅助服务。

A simple example for the above is the auxiliary service ‘ShuffleHandler’ for MapReduce (MR). ShuffleHandler respects the above two requirements already, so users/admins don’t have do anything for it to support NM restart: (1) The configuration property mapreduce.shuffle.port controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous state after NM restarts.

上面的一个简单例子是MapReduce(MR)的辅助服务“Suffle处理程序”。ShuffleHandler已经遵守了上面两个要求,所以用户/管理员没有做任何事情来支持NM重启:(1)配置属性mapreduce.shuffle.port控制NodeManager主机上的ShuffleHandler绑定到哪个端口,并且默认为非临时端口。(2)在NM重新启动之后,Suffle处理程序服务也已经支持恢复先前的状态。

### Flink Web Interface 提交任务并设置参数 要在虚拟机上的 Flink 配置文件路径 (`bin` 和 `etc` 目录) 中通过 Web 界面提交带有参数的任务,可以按照以下方法操作: #### 1. 修改配置文件 Flink 的主要配置文件位于 `etc/flink-conf.yaml`。可以通过编辑此文件来定义全局参数[^2]。例如,如果需要指定 Kafka 连接器的镜像拉取命令,则可以在 Docker 容器启动前执行如下命令以确保所需镜像已加载到环境中: ```bash sudo docker pull flinkstreaming/flink-connectors-kafka ``` 对于其他特定于作业的参数,通常不会直接写入 `flink-conf.yaml` 文件中,而是作为程序的一部分或者通过 CLI 参数传递。 #### 2. 使用 `-yD` 设置动态参数 当通过 Flink Web UI 提交任务时,支持使用 YAML 或者 JSON 格式的参数文件上传至服务器端解析。然而更常见的方式是在页面上手动输入参数键值对,这些参数会映射到运行时环境变量或 JobManager/TaskManager 属性表里[^3]。具体来说,在提交任务对话框中有选项允许附加额外属性比如: - 并行度(`parallelism`) - 检查点间隔时间(`state.checkpoints.num-retained`) 假设我们希望调整 checkpoint 存储位置以及启用 exactly-once 语义模式,那么应该这样填写字段名与对应值: | Key | Value | | --- | ----- | | state.backend | filesystem | | state.checkpoint.dir | hdfs:///checkpoints | 上述设定将会覆盖默认行为,并且仅作用于此单独实例而非影响整个集群范围内的所有应用进程。 #### 3. 编辑脚本启动服务 为了简化流程控制逻辑,还可以自定义 shell 脚本来完成一系列初始化准备工作后再调用官方提供的 start-cluster.sh 命令开启服务监听端口等待请求到来。下面给出一个简单的例子展示如何组合多个步骤一起执行: ```bash #!/bin/bash # Step A: Ensure required images are available locally. docker pull flinkstreaming/flink-connectors-kafka # Step B: Start YARN session with custom configurations applied. ./bin/yarn-session.sh -n 4 \ -jm 1024m \ -tm 4096m \ --detached \ --name my-flink-job \ --from-savepoint /path/to/savepoint_location ``` 注意这里只是示范用途的实际生产环境下可能还需要考虑更多因素如安全性验证机制等等[^4]。 最后提醒一下虽然本文档重点讨论了基于web ui方式部署方案但实际上很多时候也会采用rest api接口形式自动化管理调度更加灵活高效值得深入研究学习!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值