2.Hadoop 集群HA Standby Namenode损坏故障修复2

1.检查Standby Namenode 的状态。

[hadoop@big82 current]$ hdfs dfs -ls /       ---我的Hdfs上有一个目录:/test1
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2022-04-16 18:47 /test1
[hadoop@big82 current]$ hdfs haadmin -getAllServiceState   
big81:9000                                         active    
big82:9000                                         standby    --状态正常。

2.模拟Standby namenode 损坏。

[hadoop@big82 current]$ pwd
/data02/current
[hadoop@big82 current]$ ll
total 16
-rw-rw-r-- 1 hadoop hadoop 388 Apr 17 09:50 fsimage_0000000000000000000
-rw-rw-r-- 1 hadoop hadoop  62 Apr 17 09:50 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 hadoop hadoop   2 Apr 17 09:50 seen_txid
-rw-rw-r-- 1 hadoop hadoop 216 Apr 17 09:50 VERSION
[hadoop@big82 current]$
rm -rf *     将Standby Namenode里面的数据全部删除。

此时系统尚未感知到standby namenode 挂了,我们重启standby namenode ,让集群知道Standby namenode 不在了。

3.重启Standby namenode ;

[hadoop@big82 current]$ hdfs --daemon stop namenode
[hadoop@big82 current]$ hdfs --daemon start namenode
[hadoop@big82 current]$ jps
18594 ResourceManager
22088 Jps
17803 DFSZKFailoverController
[hadoop@big82 current]$ hdfs haadmin -getAllServiceState
big81:9000                                         active    
2022-04-17 10:12:43,122 INFO ipc.Client: Retrying connect to server: big82/192.168.1.82:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

big82:9000                                         Failed to connect: Call From big82/192.168.1.82 to big82:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

可以看到Standby Namenode 无法启动。同时Datanode节点的日志中出现:ipc连接异常的报错

2022-04-17 10:12:38,881 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: big82/192.168.1.82:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2022-04-17 10:15:30,301 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
java.net.ConnectException: Call From big91/192.168.1.91 to big82:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)

4.修复Standby namenode

[hadoop@big82 current]$ hdfs namenode -bootstrapStandby       ---格式化Standby Namenode;
Java HotSpot(TM) 64-Bit Server VM warning: Cannot open file /data02/logs/hadoop/gc.log due to No such file or directory

log4j:ERROR Could not find value for key log4j.appender.DRFAAUDIT
log4j:ERROR Could not instantiate appender named "DRFAAUDIT".
2022-04-17 10:17:00,047 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = big82/192.168.1.82
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 3.1.1
..........................................................................................

STARTUP_MSG:   build = https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c; compiled by 'leftnoteasy' on 2018-08-02T04:26Z
STARTUP_MSG:   java = 1.8.0_261
************************************************************/
2022-04-17 10:17:00,059 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2022-04-17 10:17:00,066 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
2022-04-17 10:17:00,279 INFO ha.BootstrapStandby: Found nn: fgedunn1, ipc: big81/192.168.1.81:9000
=====================================================
About to bootstrap Standby ID fgedunn2 from:
           Nameservice ID: fgeduns
        Other Namenode ID: fgedunn1
  Other NN's HTTP address: http://big81:50070
  Other NN's IPC  address: big81/192.168.1.81:9000
             Namespace ID: 166178331
            Block pool ID: BP-1145621526-192.168.1.81-1650104239176
               Cluster ID: CID-6e446eb6-01fa-4f97-9e08-9f5842bf335a
           Layout version: -64
       isUpgradeFinalized: true
=====================================================
Re-format filesystem in Storage Directory root= /data02; location= null ? (Y or N) y  --确认重新格式化
2022-04-17 10:17:34,117 INFO common.Storage: Will remove files: []
2022-04-17 10:17:34,129 INFO common.Storage: Storage directory /data02 has been successfully formatted.   --格式化成功。
2022-04-17 10:17:34,760 INFO namenode.FSEditLog: Edit logging is async:true  --同步完成。
2022-04-17 10:17:35,015 INFO namenode.TransferFsImage: Opening connection to http://big81:50070/imagetransfer?getimage=1&txid=0&storageInfo=-64:166178331:1650104239176:CID-6e446eb6-01fa-4f97-9e08-9f5842bf335a&bootstrapstandby=true
2022-04-17 10:17:35,060 INFO common.Util: Combined time for file download and fsync to all disks took 0.00s. The file download took 0.00s at 0.00 KB/s. Synchronous (fsync) write to disk of /data02/current/fsimage.ckpt_0000000000000000000 took 0.00s.
2022-04-17 10:17:35,060 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 388 bytes.
2022-04-17 10:17:35,077 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at big82/192.168.1.82
************************************************************/

5.启动Standby namenode;

[hadoop@big82 data02]$ cd current
[hadoop@big82 current]$ ll   ---可以看到格式之后,重新生成了fsimage
total 16
-rw-rw-r-- 1 hadoop hadoop 388 Apr 17 10:17 fsimage_0000000000000000000
-rw-rw-r-- 1 hadoop hadoop  62 Apr 17 10:17 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 hadoop hadoop   2 Apr 17 10:17 seen_txid
-rw-rw-r-- 1 hadoop hadoop 216 Apr 17 10:17 VERSION
[hadoop@big82 current]$
hdfs --daemon start namenode
[hadoop@big82 current]$ jps
18594 ResourceManager
17803 DFSZKFailoverController
22475 Jps

22396 NameNode      --Standby namenode 又启动了。
[hadoop@big82 current]$ hdfs haadmin -getAllServiceState
big81:9000                                         active    

big82:9000                                         standby     --备Standby namenode;

6.检查Datanode 是否还有报错。

2022-04-17 10:19:20,659 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-1145621526-192.168.1.81-1650104239176 (Datanode Uuid 31d0fd01-959e-434b-998e-016345cf3d7e) service to big82/192.168.1.82:9000 successfully registered with NN
2022-04-17 10:19:20,687 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0x95b5123c058e9e1f,  containing 3 storage report(s), of which we sent 3. The reports had 0 total blocks and used 1 RPC(s). This took 1 msec to generate and 19 msecs for RPC and NN processing. Got back no commands

说明Standby namenode可以正常处理了。

7.检查我们hadoop 目录。

[hadoop@big82 current]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2022-04-16 18:47 /test1

至此,修复完成。

25/03/28 18:37:39 ERROR namenode.fsNamesystem: fsNamesystem initialization failed.java.io.IoException: Invalidconfiguration: a shared edits dir must not be specified if HA isnot enabled.at org.apache.hadoop.hdfs.server.namenode.fsNamesystem.<init>(fsNamesystem.java:762)at org.apache.hadoop.hdfs.server.namenode.FsNamesystem.<init>(FsNamesystem.java:697)at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:985.at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)25/03/28 18:37:39 INF0 namenode.FSNamesystem: stopping services started for active state25/03/28 18:37:39 INFOnamenode.FsNamesystem: stopping services started for standby state25/03/28 18:37:39 WARN namenode.NameNode: Encountered exception during format:shared edits dir must not be specified if HA is not enabled.java.io.IoException: Invalidconfiquration:at org.apache.hadoop.hdfs.server.namenode.fsNamesystem.<init>(FsNamesystem.java:762)at org.apache.hadoop.hdfs.server.namenode.fsNamesystem.<init>(fsNamesystem.java:697)at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:985)at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode. java:1434)at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:155925/03/28 18:37:39 ERRoR namenode.NameNode: Failed to start namenode.java.io.IoException: Invalidconfiguration: a shared edits dir must not be specified if HA isnot enabled.at org.apache.hadoop.hdfs.server.namenode.fSNamesystem.<init>(FSNamesystem.java:762)at org.apache.hadoop.hdfs.server.namenode.FsNamesystem.<init>(FSNamesystem.java:697)at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:985at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)at org.apache.h 我要启动高可用模式,但是格式化namenode的时候一直报这种错误是为啥啊
03-29
Failed to connect: HA for n经DataX智能分析,该任务最可能的错误原因是: com.alibaba.datax.common.exception.DataXException: Code:[HdfsWriter-06], Description:[与HDFS建立连接时出现IO异常.]. - org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2017) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1441) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3125) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1173) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:973) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.Clieamenode is not enabled
03-18
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) , while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over node02/192.168.230.120:8020 after 3 failover attempts. Trying to failover after sleeping for 5431ms. 25/04/01 15:03:28 INFO retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1983) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1386) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2944) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1124) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:873) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
最新发布
04-02
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值