hadoop单节点转HA搭建过程中出现的问题总结篇(二)

本文解决Hadoop HA模式下Namenode格式化失败及IncompatiblenamespaceID异常,提供启动JournalNode服务、正确格式化流程及版本号一致性调整方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、namenode的格式化报错

bin/hadoop namenode -format 时候出现以下错误 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. n
$$
15/12/04 04:52:50 WARN namenode.NameNode: Encountered exception during format:

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 3 exceptions thrown:

10.165.114.138:8485: Call From hd1/10.172.153.46 to hd3:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

10.172.218.18:8485: Call From hd1/10.172.153.46 to hd2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

10.172.153.46:8485: Call From hd1/10.172.153.46 to hd1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

    at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

    at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)

    at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)

    at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)

15/12/04 04:52:50 FATAL namenode.NameNode: Failed to start namenode.

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 3 exceptions thrown:

10.165.114.138:8485: Call From hd1/10.172.153.46 to hd3:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

10.172.218.18:8485: Call From hd1/10.172.153.46 to hd2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

10.172.153.46:8485: Call From hd1/10.172.153.46 to hd1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

    at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

    at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)

    at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)

    at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
$$

解决方案一:

在各JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode:
然后格式化就没问题了

解决方案二:

HA模式第一次或删除格式化版本后格式化失败,报如上错误:
解决方案:
先启动 ./start-dfs.sh再进行 格式化,则成功

二、hadoop中 Incompatible namespaceID for journal Storage Directory 异常的处理方法

参考链接:https://blog.youkuaiyun.com/shifenglov/article/details/38583971

原因是journal node中的存储数据版本号与name node中的版本号不一致造成的 ,类似的版本冲突解决的三种方式Incompatible namespaceID

造成的原因很多是在CDH中,没有关闭HA的情况升级CDH版本,造成journal node的数据与name node数据不一致。
也有可能是没有正常关闭集群,造成journal node与name node数据不一致,版本不同。

解决步骤:
将name node的版本号改成与journal node的版本号一致  ,   vi /home/rimi/bigData/hadoop-2.2.0/tmp/dfs/name/current/VERSION 
启动zookeeper
重启集群,dfs-start.sh
启动zkfc
name node的一个节点会正常工作,切换成active

可能出现:另外一个节点不能正常工作,可以重新格式化,并与主节点同步 , 在standby节点执行命令:hdfs namenode -bootstrapStandby

## 可能出现:standby节点不能正常同步的问题:格式化journal node ,  当前的方式就是用群中正常的journal node数据替换异常的journal node ,格式化journaln ode

或者:hdfs namenode -initializeSharedEdits

参考

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值