Read note of HDFS User Guide

本文详细介绍了Hadoop集群中的Checkpoint节点、Backup节点的操作及管理,包括如何启动、配置参数、导入检查点、平衡DataNode负载、恢复模式、升级与回滚等关键步骤。同时,提供了启动均衡器、HDFS升级流程、以及数据恢复方法。

CheckPoint Node:

The Checkpoint node's memory requirements are on the same order as the NameNode. The Checkpoint node is started by (execute on checkpoint node)

bin/hdfs namenode -checkpoint 

two configuration parameters

  • dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints
  • dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.

Backup Node:

As the Backup node maintains a copy of the namespace in memory,its RAM requirements are the same as the NameNode.

The NameNode supports one Backup node at a time.No Checkpoint nodes may be registered if a Backup node is in use

The Backup node is started by (execute on Backup node)

bin/hdfs namenode -backup

Import CheckPoint

The latest checkpoint can be imported to the NameNode if all other copies of the image and the edits files are lost. In order to do that one should:

  • Create an empty directory specified in the dfs.namenode.name.dir configuration variable;
  • Specify the location of the checkpoint directory in the configuration variable dfs.namenode.checkpoint.dir;
  • and start the NameNode with -importCheckpoint option.


Balancer all datanode

To start: 
       bin/hadoop-daemon.sh start balancer [-threshold <threshold>] 
       Example: 

bin/hadoop-daemon.sh start balancer
start the balancer with a default threshold of 10% 
bin/hadoop-daemon.sh start balancer -threshold 5
start the balancer with a threshold of 5% 

To stop: 

bin/hadoop-daemon.sh stop balancer

Recovery mode

However, what can you do if the only storage locations available are corrupt? In this case, there is a special NameNode startup mode called Recovery mode that may allow you to recover most of your data.

You can start the NameNode in recovery mode like so: namenode -recover

Recovery mode can cause you to lose data, you should always back up your edit log and fsimage before using it.


HDFS Upgrade and RollBack

Before upgrading, administrators need to remove existing backup using bin/hadoop dfsadmin -finalizeUpgrade command. The following briefly describes the typical upgrade procedure:

  • Before upgrading Hadoop software, finalize if there an existing backup. dfsadmin -upgradeProgress status can tell if the cluster needs to be finalized.
  • Stop the cluster and distribute new version of Hadoop.
  • Run the new version with -upgrade option (bin/start-dfs.sh -upgrade).
  • Most of the time, cluster works just fine. Once the new HDFS is considered working well (may be after a few days of operation), finalize the upgrade.Note that until the cluster is finalized, deleting the files that existed before the upgrade does not free up real disk space on the DataNodes.
  • If there is a need to move back to the old version,
    • stop the cluster and distribute earlier version of Hadoop.
    • start the cluster with rollback option. (bin/start-dfs.h -rollback).









 

### HDFS 用户配置文件位置及用户目录结构 HDFS的用户相关配置主要集中在以下几个方面:配置文件路径、用户目录结构以及权限管理。以下是详细的说明: #### 1. 配置文件位置 HDFS的核心配置文件主要包括`core-site.xml`和`hdfs-site.xml`,这些文件通常位于Hadoop安装目录下的`etc/hadoop`路径中[^5]。例如,如果Hadoop安装在`/opt/module/hadoop-3.1.3`,那么配置文件路径为: ```bash /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml ``` 此外,与用户相关的环境变量可能需要在`hadoop-env.sh`中进行设置[^5]。例如,可以在此文件中定义Java环境变量或特定的用户权限。 #### 2. 用户目录结构 HDFS维护了一个树状的命名空间,类似于本地文件系统,以根目录“/”开始[^3]。每个用户在HDFS中通常会有一个专属的目录,路径格式为`/user/用户名`。例如,对于用户名为`alice`的用户,其默认目录为`/user/alice`。 用户目录的创建可以通过以下方式实现: - 如果用户首次访问HDFS,NameNode会自动为其创建对应的用户目录。 - 手动创建用户目录,命令如下: ```bash hadoop fs -mkdir /user/alice hadoop fs -chown alice:alice /user/alice ``` #### 3. 权限管理 HDFS支持基于用户的权限管理,确保不同用户只能访问自己的数据。权限包括读(r)、写(w)和执行(x)。用户可以通过以下命令修改权限或所有权: - 修改权限: ```bash hadoop fs -chmod 700 /user/alice ``` - 修改所有者: ```bash hadoop fs -chown alice:alice /user/alice ``` #### 4. 数据节点存储结构 在HDFS的数据节点上,实际的文件块存储路径由`dfs.datanode.data.dir`配置项决定[^2]。例如,如果配置为`/cloud/data1/hadoop/dfs/dn`,那么该路径下会存储所有分配给此DataNode的文件块。 ### 示例代码 以下是一个示例,展示如何检查用户目录是否存在并创建它: ```bash # 检查用户目录是否存在 hadoop fs -test -d /user/alice # 如果不存在,则创建 if [ $? -ne 0 ]; then hadoop fs -mkdir /user/alice hadoop fs -chown alice:alice /user/alice fi ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值