基于hadoop0.202版本的namenode与secondarynamenode分离实验

本文介绍如何在Hadoop集群中安全地分离NameNode与SecondaryNameNode,避免因单一节点故障导致集群不可恢复的风险。文章详细阐述了配置过程,包括克隆节点、修改配置文件、调整参数等内容。

我们在配置集群时,经常将namenode与secondarynamenode存放在一个节点上,其实这是非常危险的,

如果此节点崩溃的话,则整个集群不可恢复。下面介绍一下将namenode与secondarynamenode分离的

方法。当然还存在好多不足和待改进的地方,欢迎各位大神指点和吐槽。

 

非常说明:我原本以为masters配置文件中的内容(主机名)是指的namenode的主机名称,但它实际上

指的是secondarynamenode,slavers配置文件指的是运行了 datanode 和 tasktracker(一般是同一节点)

的所有节点。并且这两个文件只有在运行namenode和jobtracker(一般都在namenode节点上 namenode由

core-site.xml fs.default.name指定,jobtracker由mapred-site.xml mapred.job.tracker指定)

的节点才被用到,所以其它节点可以不进行配置。


所以千万不要忘记修改namenode节点中masters文件中的内容

言归正传(本实验结合本博中的集群搭建后的环境进行的)

1 将namenode所在的节点进行克隆,即新建一个节点,包括conf目录下的文件配置
所有文件、目录结构、环境变量等都要相同。可参考给集群添加一个新建节点一节,相关配置如:
主机名 secondary


IP 192.168.5.16


hosts文件 :

192.168.5.13 namenode  

192.168.5.16 secondary


SSH免密码登录 


关于hosts文件和ssh,我认为secondarynamenode只与namenode通信,所以只需跟namenode节点建立

无密码连接即可,并且hosts文件的内容可以只写namenode节点和自身的信息,注意namenode节点中的

hosts文件也需添加secondarynamenode节点的信息才可。


2 文件配置

(1)在namenode节点中 修改hdfs-site.xml文件为:

<property>
<name>dfs.secondary.http.address</name>
<value>192.168.5.16:50090</value>
<description>NameNode get the newest fsimage via dfs.secondary.http.address </description>
</property>

在masters文件中修改为secondary

(2)在secondarynamenodenamenode节点中 修改hdfs-site.xml文件为:

<property>
<name>dfs.http.address</name>
<value>192.168.5.13:50070</value>
<description>Secondary get fsimage and edits via dfs.http.address</description>
</property>

修改core-site.xml文件
<property>
<name>fs.checkpoint.period</name>
<value>3600</value>
<description>The number of seconds between two periodic checkpoints.</description>
</property>  


<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
</property>  


<property>
<name>fs.checkpoint.dir</name>
<value>/home/zhang/hadoop0202/secondaryname</value> 

</property>

其中fs.checkpoint.period和fs.checkpoint.size是SecondaryNameNode节点开始备份满足的条件,当

满足两种情况中的任意一个,SecondaryNameNode节点都会开始备份,第一个为设定的间隔时间到

了(默认为一小时)fs.checkpoint.period设置的时间(以秒为单位),第二个为操作日志文件的大小达到了fs.checkpoint.size中设置的阈值。


3 重启 hadoop或者在secondary上直接进行

hadoop-daemon.sh start  secondarynamenode  命令启动secondaryNamenode


重启后我们可以看到

在namenode中没有了SecondaryNameNode的Java进程(很抱歉,忘记分离之前截图了,分离之前在namenode节点上

确实有SecondaryNameNode的Java进程)

在secondary节点上出现SecondaryNameNode的Java进程


验证:在secondary节点上的secondaryname目录中是否有了有了镜像文件(由于在设置core-siet.xml文件

中的fs.checkpoint.period参数是3600,代表一小时,我们为了实验效果要进行参数修改,

修改效果可以参照本博中《怎样控制namenode检查点的发生频率》一文)

root@ubuntu-virtual-machine:/opt# $ZOOKEEPER_HOME/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /app/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: follower root@ubuntu-virtual-machine:/opt# jps 8532 Jps 2759 QuorumPeerMain 3001 NameNode 3403 SecondaryNameNode 3167 DataNode root@ubuntu-virtual-machine:/opt# start-hbase.sh start-hbase.sh:未找到命令 root@ubuntu-virtual-machine:/opt# vim /etc/profile root@ubuntu-virtual-machine:/opt# scp /etc/profile root@slave1:/etc/ profile 100% 1003 213.4KB/s 00:00 root@ubuntu-virtual-machine:/opt# scp /etc/profile root@slave2:/etc/ profile 100% 1003 157.7KB/s 00:00 root@ubuntu-virtual-machine:/opt# source /etc/profile root@ubuntu-virtual-machine:/opt# start-hbase.sh SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/hbase-2.4.11/lib/client-facing-thirdparty/slf4j-reload4j-1.7.33.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/hadoop3.3/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] running master, logging to /app/hbase-2.4.11/logs/hbase-root-master-ubuntu-virtual-machine.out slave2: running regionserver, logging to /app/hbase-2.4.11/bin/../logs/hbase-root-regionserver-ubuntu-virtual-machine.out slave1: running regionserver, logging to /app/hbase-2.4.11/bin/../logs/hbase-root-regionserver-ubuntu-virtual-machine.out master: running regionserver, logging to /app/hbase-2.4.11/bin/../logs/hbase-root-regionserver-ubuntu-virtual-machine.out root@ubuntu-virtual-machine:/opt# jps 28180 HRegionServer 2759 QuorumPeerMain 27959 HMaster 3001 NameNode 3403 SecondaryNameNode 28301 Jps 3167 DataNode root@ubuntu-virtual-machine:/opt# jps 28324 Jps 28180 HRegionServer 2759 QuorumPeerMain 27959 HMaster 3001 NameNode 3403 SecondaryNameNode 3167 DataNode root@ubuntu-virtual-machine:/opt# jps 28180 HRegionServer 2759 QuorumPeerMain 27959 HMaster 3001 NameNode 28969 Jps 3403 SecondaryNameNode 3167 DataNode root@ubuntu-virtual-machine:/opt# jps 28180 HRegionServer 2759 QuorumPeerMain 3001 NameNode 3403 SecondaryNameNode 29405 Jps 3167 DataNode root@ubuntu-virtual-machine:/opt# jps 29555 Jps 28180 HRegionServer 2759 QuorumPeerMain 3001 NameNode 3403 SecondaryNameNode 3167 DataNode root@ubuntu-virtual-machine:/opt# $ ZOOKEEPER_HOME/bin/zkServer.sh status $:未找到命令 root@ubuntu-virtual-machine:/opt# $ZOOKEEPER_HOME/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /app/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: follower root@ubuntu-virtual-machine:/opt# hbase hbck SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/hbase-2.4.11/lib/client-facing-thirdparty/slf4j-reload4j-1.7.33.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/hadoop3.3/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] 2025-06-20 09:03:53,022 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181 2025-06-20 09:03:53,091 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=ubuntu-virtual-machine 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_202 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:java.home=/app/jdk1.8.0_202/jre 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: nt-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-common-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-common-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-services-core-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-router-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-registry-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-tests-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-applications-mawo-core-3.3.6.jar:/app/hadoop3.3/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.6.jar:/app/hadoop3.3/conf 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/app/hadoop3.3/lib/native 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=5.4.0-84-generic 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=root 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/root 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/opt 2025-06-20 09:03:53,092 INFO [main] zookeeper.ZooKeeper: Client environment:os.memory.free=48MB 2025-06-20 09:03:53,094 INFO [main] zookeeper.ZooKeeper: Client environment:os.memory.max=961MB 2025-06-20 09:03:53,094 INFO [main] zookeeper.ZooKeeper: Client environment:os.memory.total=59MB 2025-06-20 09:03:53,112 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3d246ea3 2025-06-20 09:03:53,188 INFO [main] common.X509Util: Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation 2025-06-20 09:03:53,234 INFO [main] zookeeper.ClientCnxnSocket: jute.maxbuffer value is 4194304 Bytes 2025-06-20 09:03:53,268 INFO [main] zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled= HBaseFsck command line options: 2025-06-20 09:03:53,324 INFO [main] util.HBaseFsck: Launching hbck 2025-06-20 09:03:53,410 INFO [main-SendThread(master:2181)] zookeeper.ClientCnxn: Opening socket connection to server master/192.168.60.136:2181. Will not attempt to authenticate using SASL (unknown error) 2025-06-20 09:03:53,533 INFO [main-SendThread(master:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.60.136:43364, server: master/192.168.60.136:2181 2025-06-20 09:03:53,629 INFO [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session establishment complete on server master/192.168.60.136:2181, sessionid = 0x1000012ace40001, negotiated timeout = 40000 2025-06-20 09:03:54,207 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x1fde5d22] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$18/649052050@6792b5a9 2025-06-20 09:03:54,210 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x1fde5d22] zookeeper.ClientCnxnSocket: jute.maxbuffer value is 4194304 Bytes 2025-06-20 09:03:54,219 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x1fde5d22] zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled= 2025-06-20 09:03:54,233 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x1fde5d22-SendThread(slave2:2181)] zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.60.140:2181. Will not attempt to authenticate using SASL (unknown error) 2025-06-20 09:03:54,240 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x1fde5d22-SendThread(slave2:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.60.136:47330, server: slave2/192.168.60.140:2181 2025-06-20 09:03:54,520 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x1fde5d22-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.60.140:2181, sessionid = 0x300003130b30000, negotiated timeout = 40000 2025-06-20 09:03:54,642 WARN [main] client.ConnectionImplementation: Retrieve cluster id failed java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:600) at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:313) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:230) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:228) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:128) at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:578) at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3800) at org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(HBaseFsck.java:3624) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3612) Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:174) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:339) at java.lang.Thread.run(Thread.java:748) 2025-06-20 09:03:59,991 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=16, started=4330 ms ago, cancelled=false, msg=java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master, details=, see https://s.apache.org/timeout
06-21
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值