2014-07-19 21:55:49,823 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 15414 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [192.168.1.202:8485]
2014-07-19 21:56:11,660 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 37251 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [192.168.1.202:8485]
2014-07-19 21:56:22,652 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.200:8485, 192.168.1.201:8485, 192.168.1.202:8485], stream=QuorumOutputStream starting at txid 190))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:492)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:352)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:488)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:613)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1057)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:995)
at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1082)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5050)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:832)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
2014-07-19 21:56:24,074 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 190
2014-07-19 21:56:28,232 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 55983ms to send a batch of 1 edits (13 bytes) to remote journal 192.168.1.200:8485
2014-07-19 21:56:30,303 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 55896ms to send a batch of 1 edits (13 bytes) to remote journal 192.168.1.201:8485
2014-07-19 21:57:09,300 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-07-19 21:57:14,382 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave01/192.168.1.200
2014-07-19 21:56:11,660 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 37251 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [192.168.1.202:8485]
2014-07-19 21:56:22,652 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.200:8485, 192.168.1.201:8485, 192.168.1.202:8485], stream=QuorumOutputStream starting at txid 190))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:492)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:352)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:488)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:613)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1057)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:995)
at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1082)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5050)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:832)
at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:11214)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
2014-07-19 21:56:24,074 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 190
2014-07-19 21:56:28,232 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 55983ms to send a batch of 1 edits (13 bytes) to remote journal 192.168.1.200:8485
2014-07-19 21:56:30,303 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 55896ms to send a batch of 1 edits (13 bytes) to remote journal 192.168.1.201:8485
2014-07-19 21:57:09,300 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-07-19 21:57:14,382 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave01/192.168.1.200
************************************************************/
active节点死掉后,没有进行故障转移和切换,standy节点没有自动转变成active状态。导致整个集群死掉。
在Hadoop 2.2.0的高可用性(HA)设置中,当Active NameNode意外宕机时,预期的故障转移并未发生。Standby NameNode未能自动接管并转换为Active状态,从而导致整个Hadoop集群陷入不可用状态。
1678

被折叠的 条评论
为什么被折叠?



