原题目:高可用集群重启失败,standby 无法正常运行
背景
12月03号 梳理集群中的log,并处理相应的ERROR ,完成修改操作后重启集群。重启失败
2019-12-04 00:23:30,522 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://hostname:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp0F22GO 2>/tmp/tmph111qW''] {'quiet': False}
2019-12-04 00:23:30,574 - call returned (7, '')
2019-12-04 00:23:30,574 - call['hdfs haadmin -ns vpc-cluster -getServiceState nn2'] {'logoutput': True, 'user': 'hdfs'}
19/12/04 00:23:32 INFO ipc.Client: Retrying connect to server: hostname/172.00.00.00:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From vpc-hostname/172.00.00.00 to vpc-hostname:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2019-12-04 00:23:32,544 - call returned (255, '19/12/04 00:23:32 INFO ipc.Client: Retrying connect to server: vpc-hostname/172.00.00.00:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)\nOperation failed: Call From hostname/172.00.00.00 to hostname:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused')
2019-12-04 00:23:32,544 - NameNode HA states: active_namenodes = [(u'nn1', 'vpc-hostname1:50070')], standby_namenodes = [], unknown_namenodes = [(u'nn2', 'vpc-hostname2:50070')]
错误原因
namenode 存储的fsimage 路径[.../hadoop/hdfs/namenode/current]
开启namenode时,两个namenode需要从QJM中同步对应的edits文件,然后选举active节点。
此时zookeeper01节点同步edits发现数据之间存在空白。 [即fsimage后面的序列号与QJM中同步过来的数据不连续。] namenode启动失败
失败后,集群输出高可用状态 NameNode HA states: active_namenodes = [(u'nn1', 'hostname:50070')], standby_namenodes = [], unknown_namenodes = [(u'nn2', 'hostname:50070')]
使用
hdfs namenode -bootstrapStandby |
---|
命令拉取editlog文件,切勿手动拉取editlog文件。
经验:
hadoop 50070和8020发生错误,检查fsimage文件。
对错误进行处理时,优先级是 ambari命令、hadoop命令、linux命令