Yarn Label 调度

本文介绍如何在YARN中利用标签进行资源管理和任务调度。包括创建YarnLabel标签、为集群节点分配标签、创建Queue并关联标签的具体步骤及配置。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1. Yarn Label标签创建

当重启ResourceManager时,可以保证Yarn Label标签在服务重启之后依然可用,需要在HDFS上指定 Yarn Label的存储路径,具体操作如下:

  1. HDFS上创建目录,用于存放Yarn Label标签数据(注意创建的目录Yarn要可以访问),如下命令可能会被使用到
hdfs dfs -mkdir -p /yarn/node-labels
hdfs dfs -chown -R yarn:yarn /yarn/node-labels
hdfs dfs -chmod -R 700 /yarn/node-labels
  1. 启动集群标签功能
<property>
    <name>yarn.node-labels.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.node-labels.fs-store.root-dir</name>
    <value>hdfs://<host>:<port>/<absolute_path_to_node_label_directory></value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

重启ResourceManager配置生效

yarn-daemon.sh stop  resourcemanager
yarn-daemon.sh start resourcemanager
  1. yarn创建集群标签
yarn rmadmin -addToClusterNodeLabels "<label1>(exclusive=<true|false>),<label2>(exclusive=<true|false>)"

注:exclusive并不是必须的参数,默认值为true
4. 删除集群标签

yarn rmadmin -removeFromClusterNodeLabels "<label1>,<label2>"

注:如果某个Label已经关联Queue,那么你无法从集群中将其移除

2. 为集群节点创建Label标签

  1. 节点关联标签
yarn rmadmin -replaceLabelsOnNode "<node1>:<port>=<label1> <node2>:<port>=<label2>"

注:这里用于关联的集群标签必须是之前定义好的,这一步是将集群标签与节点关联(为节点打标签)
2. 删除节点标签 (将节点的标签置为空)

yarn rmadmin -replaceLabelsOnNode "<node_name>"

3. 创建Queue并关联标签

<configuration>
    <!-- 定义了root下有三个队列root.pro root.dev root.def -->
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>pro,dev,def</value>
    </property>
    <!-- 定义了root队列默认分区的资源容量为100% -->
    <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
    </property>
    <!-- 定义root队列的默认分区的资源最大容量为100% -->
    <property>
        <name>yarn.scheduler.capacity.root.maximum-capacity</name>
        <value>100</value>
    </property>
    <!-- 定义了root队列可以访问的标签为所有分区,如果为空则表示只能访问默认分区 -->
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
        <value>*</value>
    </property>
    <!-- 定义了root队列可以访问的cpu分区的容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.cpu.capacity</name>
        <value>100</value>
    </property>
    <!-- 定义了root队列可以访问的cpu分区的最大容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.cpu.maximum-capacity</name>
        <value>100</value>
    </property>
    <!-- 定义了root队列可以访问的mem分区的容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.mem.capacity</name>
        <value>100</value>
    </property>
    <!-- 定义了root队列可以访问的mem分区的最大容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.mem.maximum-capacity</name>
        <value>100</value>
    </property>

    <!--队列root.pro使用默认分区的容量为40%-->
    <property>
        <name>yarn.scheduler.capacity.root.pro.capacity</name>
        <value>40</value>
    </property>
    <!-- 队列root.pro使用默认分区的最大容量为100% -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.maximum-capacity</name>
        <value>100</value>
    </property>
    <!-- 队列root.pro可以访问的分区cpu和mem -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.accessible-node-labels</name>
        <value>cpu,mem</value>
    </property>
    <!-- 队列root.pro默认访问cpu分区 -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.default-node-label-expression</name>
        <value>cpu</value>
    </property>
    <!-- 队列root.pro访问cpu分区的容量为100% -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.accessible-node-labels.cpu.capacity</name>
        <value>100</value>
    </property>
    <!-- 队列root.pro访问cpu分区的最大容量为100% -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.accessible-node-labels.cpu.maximum-capacity</name>
        <value>100</value>
    </property>
    <!-- 队列root.pro访问50%的mem分区容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.accessible-node-labels.mem.capacity</name>
        <value>50</value>
    </property>
    <!-- 队列root.pro可以访问100%的mem分区的容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.pro.accessible-node-labels.mem.maximum-capacity</name>
        <value>100</value>
    </property>
    <!-- 定义root.dev队列访问默认分区的比例为30% -->
    <property>
        <name>yarn.scheduler.capacity.root.dev.capacity</name>
        <value>30</value>
    </property>
    <!-- 定义root.dev队列访问默认分区的最大比例为100% -->
    <property>
        <name>yarn.scheduler.capacity.root.dev.maximum-capacity</name>
        <value>100</value>
    </property>
    <!-- 定义root.dev队列访问的分区为mem -->
    <property>
        <name>yarn.scheduler.capacity.root.dev.accessible-node-labels</name>
        <value>mem</value>
    </property>
    <!-- 定义root.dev队列访问mem分区的容量为50% -->
    <property>
        <name>yarn.scheduler.capacity.root.dev.accessible-node-labels.mem.capacity</name>
        <value>50</value>
    </property>
    <!-- 定义root.dev队列访问mem分区的最大容量为100%-->
    <property>
        <name>yarn.scheduler.capacity.root.dev.accessible-node-labels.mem.maximum-capacity</name>
        <value>100</value>
    </property>

    <!-- 定义root.def队列访问默认分区30%的容量 -->
    <property>
        <name>yarn.scheduler.capacity.root.def.capacity</name>
        <value>30</value>
    </property>
    <!-- 定义root.def队列访问默认分区最大容量为100%-->
    <property>
        <name>yarn.scheduler.capacity.root.def.maximum-capacity</name>
        <value>100</value>
    </property>
</configuration>

注:在配置完capacity-scheduler.xml之后,需要刷新下队列,很重要,在删除集群标签时也要修改该配置文件,然后刷新列表

yarn rmadmin -refreshQueues
<think>Hmm, the user is asking how to refresh YARN queue configuration. Let me break this down. First, I recall that YARN queue configuration is managed through the capacity-scheduler.xml file for Capacity Scheduler, or fair-scheduler.xml for Fair Scheduler. The key point is that changes to these files don't take effect immediately - they need to be refreshed. Looking at the references provided, reference [2] specifically mentions editing capacity-scheduler.xml for queue label policies, and reference [3] discusses FairScheduler queue properties. This aligns with what I know about YARN configuration. The user seems to be a Hadoop administrator or developer working with YARN queues. Their question is very specific and technical, suggesting they're probably in the middle of some cluster configuration work. The tone is straightforward - they just want the quickest way to apply their config changes. The main methods I should cover are: 1) The yarn rmadmin -refreshQueues command (most common method) 2) Alternative methods like HDFS reload and service restart for completeness 3) Special considerations for Fair Scheduler since it has different behavior I should emphasize that for Capacity Scheduler, refreshQueues is instantaneous and preferred, while Fair Scheduler requires additional steps. The security note about needing admin privileges is important too - they might run into permission issues otherwise. The references support this approach: [1] gives context about YARN architecture, [2] shows queue config examples, and [3] confirms FairScheduler behaviors. I'll structure the answer to give the simplest solution first, then alternatives, then edge cases.</think>要刷新 YARN 队列配置(如修改 `capacity-scheduler.xml` 或 `fair-scheduler.xml` 后),需执行以下步骤: --- ### 🔧 方法 1:使用 YARN 命令(推荐) 1. **刷新队列配置**(需 YARN 管理员权限): ```bash yarn rmadmin -refreshQueues ``` - ✅ **效果**:即时生效,无需重启服务。 - ⚠️ **要求**:用户需具有 `yarn.admin.acl` 权限(通常在 `yarn-site.xml` 中配置)[^1][^2]。 2. **验证刷新结果**: ```bash yarn queue -status <队列名> # 查看队列状态 yarn scheduler -conf # 检查当前加载的配置 ``` --- ### 🔄 方法 2:通过 HDFS 重载(部分场景) 若队列配置与 HDFS 存储路径绑定(如 Fair Scheduler 的 `allocations file`): 1. 修改 XML 配置文件后,将文件上传至 HDFS: ```bash hdfs dfs -put fair-scheduler.xml /yarn/scheduler/ ``` 2. 触发配置重载: ```bash yarn rmadmin -refreshQueues ``` --- ### ⚠️ 注意事项 1. **调度器类型差异**: - **Capacity Scheduler**:`refreshQueues` 会**立即应用**所有队列配置变更(如资源比例、ACL 等)[^2]。 - **Fair Scheduler**:仅动态加载队列权重、ACL 等部分配置,**队列结构变更**需重启 ResourceManager[^3]。 2. **权限问题**: - 若命令报错 `User cannot run ADMIN commands`,需在 `yarn-site.xml` 添加: ```xml <property> <name>yarn.admin.acl</name> <value>你的用户名</value> </property> ``` 3. **服务重启(最后手段)**: ```bash systemctl restart hadoop-yarn-resourcemanager # 重启 ResourceManager ``` --- ### 📝 操作示例 ```bash # 修改 capacity-scheduler.xml 后刷新 $ sudo -u yarn yarn rmadmin -refreshQueues 2024-07-18 12:30:45 INFO refresh: Refreshing queues... Success ``` > **关键提示**:生产环境优先使用 `refreshQueues` 命令,避免服务中断[^1][^3]。 --- ### ❓相关问题 1. YARN 队列配置更新后任务资源未生效,如何排查? 2. Capacity Scheduler 与 Fair Scheduler 的动态刷新支持有哪些差异? 3. 如何安全地修改生产集群的 YARN 队列 ACL 权限? [^1]: Yarn 工作机制与调度器配置 [^2]: Capacity Scheduler 队列标签策略刷新 [^3]: FairScheduler 队列属性动态加载限制
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值