YARN overview

最新推荐文章于 2024-02-17 19:11:27 发布

原创最新推荐文章于 2024-02-17 19:11:27 发布 · 387 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#yarn #hadoop #官网翻译

大数据专栏收录该内容

13 篇文章

订阅专栏

YARN将资源管理和作业调度划分为全局的ResourceManager和每个应用程序的ApplicationMaster。ResourceManager包含Scheduler和ApplicationsManager，负责集群资源分配和应用管理。NodeManager监控节点资源，Health Checker Service确保节点健康。YARN支持资源预留、联盟以及NodeManager重启功能，提供高可用性和可扩展性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.

YARN的基本思想是将资源管理和作业调度/监视的功能划分为单独的守护进程。其想法是拥有一个全局资源管理器（RM）和每个应用程序应用程序管理器（AM）。应用程序要么是单个作业，要么是DAG作业

在这里插入图片描述

The ResourceManager and the NodeManager form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.

ResourceManager和NodeManager构成数据计算框架。ResourceManager是在系统中的所有应用程序之间仲裁资源的最终权威。NodeManager是每台机器框架的代理，它负责监控容器的资源使用（cpu、内存、磁盘、网络）并将其报告给ResourceManager/Scheduler。

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

每个应用程序的ApplicationMaster实际上是一个框架特定的库，任务是跟ResourceManager协商资源并与NodeManager（S）一起执行和监视任务。

The ResourceManager has two main components: Scheduler and ApplicationsManager.

ResourceManager（资源管理器）有两个主要组件：调度器（Scheduler）和应用程序管理器（ApplicationsManager）。

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based on the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.

调度器（Scheduler）负责将资源分配给各种运行的应用程序，以通晓这些应用程序受到的容量、队列等的约束。调度器（Scheduler）是纯粹的调度器，因为它不执行对应用程序的状态监视或跟踪。此外，它不对由于应用程序失败或硬件故障导致的任务失败提供重新启动的保证。调度器（Scheduler）基于应用程序的资源需求来执行调度功能，它是基于资源容器的抽象概念来实现的，该资源容器包括内存、CPU、磁盘、网络等元素。

The Scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current schedulers such as the CapacityScheduler and the FairScheduler would be some examples of plug-ins.

Scheduler有一个可插入的策略，负责将集群资源划分到各种队列、应用程序等中。当前的调度器，比如CapacityScheduler和FairScheduler，就是一些插件的例子

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

ApplicationsManager负责接受作业提交，协商第一个容器以执行特定于应用程序的ApplicationMaster，并在失败时提供重新启动ApplicationMaster容器的服务。每个应用程序ApplicationMaster负责从调度器（Scheduler）协商适当的资源容器，跟踪它们的状态并监视进度。

MapReduce in hadoop-2.x maintains API compatibility with previous stable release (hadoop-1.x). This means that all MapReduce jobs should still run unchanged on top of YARN with just a recompile.

HADOP-2 .X中的MapReduce与以前的稳定释放（HADOPO-1，X）保持了API相容性。这意味着所有在YARN基础之上的MapReduce作业都不变，只需要重新编译。

YARN supports the notion of resource reservation via the ReservationSystem, a component that allows users to specify a profile of resources over-time and temporal constraints (e.g., deadlines), and reserve resources to ensure the predictable execution of important jobs.The ReservationSystem tracks resources over-time, performs admission control for reservations, and dynamically instruct the underlying scheduler to ensure that the reservation is fullfilled.

YARN通过预定系统（ReservationSystem）支持资源预留的概念，该组件允许用户随时间的变化指定资源和临时约束（例如，截止日期），并预留资源以确保重要作业的可预测执行。通过预定系统监控资源随着时间的变化，对预订执行接纳控制，并动态指示底层调度器确保预订得以实现。

In order to scale YARN beyond few thousands nodes, YARN supports the notion of Federation via the YARN Federation feature. Federation allows to transparently wire together multiple yarn (sub-)clusters, and make them appear as a single massive cluster. This can be used to achieve larger scale, and/or to allow multiple independent clusters to be used together for very large jobs, or for tenants who have capacity across all of them.

为了使YARN超过几千个节点，YARN支持联盟（Federation）的概念。联盟允将多个YARN节点链接成一个子集群，并使它们可以像单个大集群一样许透明地写。这可以用于实现更大的规模，和/或允许多个独立的集群一起用于非常大的作业，或用于具有跨所有作业的容量的占有者。

NodeManager

The NodeManager is responsible for launching and managing containers on a node. Containers execute tasks as specified by the AppMaster.

NodeManager负责在一个节点上启动和管理容器。容器执行由AppMaster指定的任务。

Health Checker Service

The NodeManager runs services to determine the health of the node it is executing on. The services perform checks on the disk as well as any user specified tests. If any health check fails, the NodeManager marks the node as unhealthy and communicates this to the ResourceManager, which then stops assigning containers to the node. Communication of the node status is done as part of the heartbeat between the NodeManager and the ResourceManager. The intervals at which the disk checker and health monitor(described below) run don’t affect the heartbeat intervals. When the heartbeat takes place, the status of both checks is used to determine the health of the node.

NodeManager运行服务以确定其正在执行的节点的健康状况。服务执行磁盘上的检查以及任何用户指定的测试。如果任何检查（发现异常）失败，则NodeManager将节点标记为不健康状态，并将其传递给ResourceManager，然后ResourceManager停止向节点分配容器。节点状态的通信是作为NodeManager和ResourceManager之间的心跳的一部分来完成的。磁盘检查器和健康监视器（以下描述）运行的间隔不会影响心跳间隔。当心跳发生时，两种检查的状态被用来确定节点的健康状况。

Disk Checker

The disk checker checks the state of the disks that the NodeManager is configured to use(local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively). The checks include permissions and free disk space. It also checks that the filesystem isn’t in a read-only state. The checks are run at 2 minute intervals by default but can be configured to run as often as the user desires. If a disk fails the check, the NodeManager stops using that particular disk but still reports the node status as healthy. However if a number of disks fail the check(the number can be configured, as explained below), then the node is reported as unhealthy to the ResourceManager and new containers will not be assigned to the node.

磁盘检查器检查NodeManager被配置为使用的磁盘的状态（local-dirs and log-dirs, configured using yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs respectively）。检查包括权限和空闲磁盘空间。它还检查文件系统是否处于只读状态。默认情况下，检查以2分钟的间隔运行，但可以按照用户希望的那样配置运行。如果磁盘检查失败，NodeManager停止使用该特定磁盘，但仍然报告节点状态为健康状态。但是，如果某些磁盘检查失败（可以配置数字，如下面所解释的），则该节点被报告为不健康（向ResourceManager报告），并且新的容器将不被分配给节点。

The following configuration parameters can be used to modify the disk checks:

下列配置参数可用于修改磁盘检查：

Configuration Name	Allowed Values	Description
yarn.nodemanager.disk-health-checker.enable	true, false	Enable or disable the disk health checker service
yarn.nodemanager.disk-health-checker.interval-ms	Positive integer	The interval, in milliseconds, at which the disk checker should run; the default value is 2 minutes
yarn.nodemanager.disk-health-checker.min-healthy-disks	Float between 0-1	The minimum fraction of disks that must pass the check for the NodeManager to mark the node as healthy; the default is 0.25
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage	Float between 0-100	The maximum percentage of disk space that may be utilized before a disk is marked as unhealthy by the disk checker service. This check is run for every disk used by the NodeManager. The default value is 90 i.e. 90% of the disk can be used.
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb	Integer	The minimum amount of free space that must be available on the disk for the disk checker service to mark the disk as healthy. This check is run for every disk used by the NodeManager. The default value is 0 i.e. the entire disk can be used.

External Health Script

Users may specify their own health checker script that will be invoked by the health checker service. Users may specify a timeout as well as options to be passed to the script. If the script exits with a non-zero exit code, times out or results in an exception being thrown, the node is marked as unhealthy. Please note that if the script cannot be executed due to permissions or an incorrect path, etc, then it counts as a failure and the node will be reported as unhealthy. Please note that speifying a health check script is not mandatory. If no script is specified, only the disk checker status will be used to determine the health of the node.

用户可以指定自己的健康检查器脚本，该脚本将由健康检查服务调用。用户可以指定超时以及将要传递给脚本的选项。如果脚本退出时具有非零退出代码、超时或导致引发异常，则该节点被标记为不健康。请注意，如果由于权限或错误路径等原因无法执行脚本，则该脚本将视为失败，节点将报告为不健康的。请注意，发布健康检查脚本不是强制性的。如果未指定脚本，则只使用磁盘检查程序状态来确定节点的健康状况。

The following configuration parameters can be used to set the health script:

Configuration Name	Allowed Values	Description
yarn.nodemanager.health-checker.interval-ms	Postive integer	The interval, in milliseconds, at which health checker service runs; the default value is 10 minutes.
yarn.nodemanager.health-checker.script.timeout-ms	Postive integer	The timeout for the health script that’s executed; the default value is 20 minutes.
yarn.nodemanager.health-checker.script.path	String	Absolute path to the health check script to be run.
yarn.nodemanager.health-checker.script.opts	String	Arguments to be passed to the script when the script is executed.

NodeManager Restart

This document gives an overview of NodeManager (NM) restart, a feature that enables the NodeManager to be restarted without losing the active containers running on the node. At a high level, the NM stores any necessary state to a local state-store as it processes container-management requests. When the NM restarts, it recovers by first loading state for various subsystems and then letting those subsystems perform recovery using the loaded state.

本文档给出了NodeManager(NM)重启的概述，该特性使NodeManager能够在不丢失节点上运行的活动容器的情况下重启。在高级别上，NM在处理容器管理请求时将任何必要的状态存储到本地状态存储。当NM重新启动时，它通过首先为各个子系统加载状态，然后让这些子系统使用加载状态执行恢复来恢复。

Enabling NM Restart

To enable NM Restart functionality, set the following property in conf/yarn-site.xml to true.

Property	Value
yarn.nodemanager.recovery.enabled	true, (default value is set to false)

Configure a path to the local file-system directory where the NodeManager can save its run state.

Property	Description
yarn.nodemanager.recovery.dir	The local filesystem directory in which the node manager will store state when recovery is enabled. The default value is set to $hadoop.tmp.dir/yarn-nm-recovery.

Configure a valid RPC address for the NodeManager.
yarn.nodemanager.address ： Ephemeral ports (port 0, which is default) cannot be used for the NodeManager’s RPC server specified via yarn.nodemanager.address as it can make NM use different ports before and after a restart. This will break any previously running clients that were communicating with the NM before restart. Explicitly setting yarn.nodemanager.address to an address with specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling NM restart.

临时端口（缺省端口0）不能用于通过yarn.nodemanager.address指定的NodeManager的RPC服务器，因为它可以使NM在重启之前和之后使用不同的端口。这将在重新启动之前中断与NM通信的任何先前运行的客户端。明确地将yarn.nodemanager.address设置为具有特定端口号的地址（例如，0.0.0.0:45454）是启用NM重新启动的先决条件。

Auxiliary services.

NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitializes the auxiliary service.

在YARN集群中的NodeManagers可以配置为运行辅助服务。对于完全功能的NM重启，YARN依赖于配置的任何辅助服务同样支持恢复。这通常包括(1)避免使用临时端口，以便先前运行的客户端(在这种情况下，通常是容器)在重新启动后不会中断；(2)当NodeManager重新启动和重新启动时，辅助服务本身通过重新加载任何先前状态来支持可恢复性，并且重新初始化辅助服务。

A simple example for the above is the auxiliary service ‘ShuffleHandler’ for MapReduce (MR). ShuffleHandler respects the above two requirements already, so users/admins don’t have do anything for it to support NM restart: (1) The configuration property mapreduce.shuffle.port controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous state after NM restarts.

上面的一个简单例子是MapReduce（MR）的辅助服务“Suffle处理程序”。ShuffleHandler已经遵守了上面两个要求，所以用户/管理员没有做任何事情来支持NM重启：（1）配置属性mapreduce.shuffle.port控制NodeManager主机上的ShuffleHandler绑定到哪个端口，并且默认为非临时端口。（2）在NM重新启动之后，Suffle处理程序服务也已经支持恢复先前的状态。