【配置CDH和管理服务】关闭DataNode前HDFS的调优

最新推荐文章于 2025-06-19 23:18:14 发布

Junjie_M

最新推荐文章于 2025-06-19 23:18:14 发布

阅读量6.3k

点赞数 1

分类专栏： CDH CM 文章标签： Cloudera Manager Decommissioning Data hadoop hdfs 管理

CDH 同时被 2 个专栏收录

15 篇文章

订阅专栏

8 篇文章

订阅专栏

在关闭DataNode前，通过调整堆栈大小、设置平衡带宽、提高复制工作乘数器、增加复制线程限制，以及重启HDFS服务，来确保集群的稳定性和高效性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

配置CDH和管理服务

关闭DataNode前HDFS的调优

角色要求：配置员、集群管理员、完全管理员

当一个DataNode关闭，NameNode确保每一个DataNode中的每一个块根据复制系数（the replication factor）跨集群仍然是可用的。这个过程涉及到DataNode间小批次的块复制。在这种情况下，一个DataNode有成千上万个块，关闭后集群间还原备份数可能需要几个小时。关闭DataNode的主机之前，你应该首先调整HDFS：

1、提高DataNode的堆栈大小。DataNode应该至少有4 GB的堆栈大小，以允许迭代的增加和最大的流

a、去HDFS服务页面；

b、单击配置（Configuration）选项卡；

c、在每个DataNode角色组（DataNode默认组和额外的DataNode角色组）去资源管理（ResourceManagement）类别，并设置DataNode的Java堆栈大小（字节）（Java Heap Size of DataNode in Bytes）；

d、点击保存更改（Save Changes）提交更改。

2、设置DataNode平衡带宽

a、展开DataNode默认组（DataNode Default Group） > 性能（Performance）类别；

b、根据你的磁盘和网络性能配置DataNode平衡带宽（DataNode Balancing Bandwidth）；

c、点击保存更改（Save Changes）提交更改。

3、提高依据迭代设置复制工作乘数器的数值（默认值是2，然而推荐值是10）

a、展开NameNode默认组（NameNode Default Group） >高级（Advanced）类别；

b、将配置依据迭代设置复制工作乘数器（Replication Work Multiplier Per Iteration）设置为10；

c、点击保存更改（Save Changes）提交更改。

4、增加复制的最大线程数和最大复制线程的限制数

a、展开NameNode默认组（NameNode Default Group） >高级（Advanced）类别；

b、配置Datanode复制线程的最大数量（Maximumnumber of replication threads on a Datanode）和Datanod复制线程的数量的限制数（Hardlimit on the number of replication threads on a Datanod）分别为50和100；

c、点击保存更改（Save Changes）提交更改。

5、重新启动HDFS服务。

翻译水平有限，以下是手打英文原文：

Configuring CDH and Managed Services

Tuning HDFS Prior to Decommissioning DataNodes

Required Role: Configurator、 Cluster Administrator、 Full Administrator

When a DataNode isdecommissioned, the NameNode ensures that every that every block from the DataNodewill still be available across the cluster as dictated by the replicationfactor. This procedure involves copying blocks off the DataNode in smallbatches. In cases where a DataNode has thousands of blocks,decommissioning cantake several hours. Before decommissioning hosts with DataNodes,you shouldfirst tune HDFS:

1、Raise the heap size of the DataNodes.DataNodes should be configured with at least 4 GB heap size to allow for theincrease in iterations and max streams.

a、Go to the HDFS service page.

b、Click the Configuration tab.

c、Under each DataNode role group (DataNodeDefault Group and additional DataNode role groups) go to the Resource Management category, and setthe Java Heap Size of DataNode in Bytesproperty as recommended.

d、Click SaveChanges to commit the changes.

2、Set the DataNode balancing bandwith:

a、Expand the DataNode Default Group > Performancecategory.

b、Configure the DataNode Balancing Bandwidth property to the bandwisth you have onyour disks and network.

c、Click SaveChanges to commit the changes.

3、Increase the replication work multiplierper iteration to a larger number (the default is 2, however 10 is recommended):

a、Expand the NameNodeDefault Group > Advancedcatrgory.

b、Configure the ReplicationWork Multiplier Per Iteration property to a value such as 10.

c、Click SaveChanges to commit the changes.

4、 Increase the replication maximim threadsand maximum replication thread hard limits:

a、 Expand the NameNodeDefault Group > Advancedcategory.

b、 Configure the Maximum number of replication threads on a Datanode and Hard limit on the number of replicationthreads on a Datanode properties to 50 and 100 respectively.

c、 Click SaveChanges to commit the Changes.

5、Restart the HDFS service.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。