屋漏偏逢连夜雨.
忙的不可开交的关键时刻, 机房ups **了, 而且还是很实在的烧了.
满机房的烟雾.
我的可怜的两套hadoop 大部分节点都掉电崩溃了.
紧急忙碌一番起来了. hadoop 1.x 启动各个服务都没有问题 顺利上线.
hadoop 2.0 的坏掉了一个 namenode , 报fsimg 格式错误,无法启动了.
重做了这个namenode . 因为用了 dfs ha 功能, 一个节点没有问题启动了.
怎么办, 把其中一个好的节点fsimg 拿到 故障节点 覆盖掉原来的.
1. 停掉集群.
2. 把fsimg 这一套拿到 故障节点 覆盖掉.
3. 手工启动qjournal
4.故障节点执行
hdfs namenode -bootstrapStandby
下面提示要不要reformat 这个 地方要选format ,其实就是把fsimg 格式化了, 然后重新从 qjournal 里下载一个.
不会对hdfs 的文件造成损坏.
屏幕输出:
hdfs namenode -bootstrapStandby
屏幕输出************************************************************/
16/03/02 16:25:14 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
16/03/02 16:25:14 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: clusterpc
Other Namenode ID: nn1
Other NN's HTTP address: http://192.168.8.51:50070
Other NN's IPC address: 192.168.8.51:8020
Namespace ID: 1056927543
Block pool ID: BP-246106129-192.168.8.51-1446005986553
Cluster ID: CID-fd414ff4-872a-4882-b304-989560cca1dc
Layout version: -63
isUpgradeFinalized: true
=====================================================
Re-format filesystem in Storage Directory /data/hadoop/dfs/name ? (Y or N) Y
Re-format filesystem in Storage Directory /data1/hadoop/dfs/name ? (Y or N) Y
16/03/02 16:28:05 INFO common.Storage: Storage directory /data/hadoop/dfs/name has been successfully formatted.
16/03/02 16:28:05 INFO common.Storage: Storage directory /data1/hadoop/dfs/name has been successfully formatted.
16/03/02 16:28:06 INFO namenode.TransferFsImage: Opening connection to http://192.168.8.51:50070/imagetransfer?getimage=1&txid=23586912&storageInfo=-63:1056927543:0:CID-fd414ff4-872a-4882-b304-989560cca1dc
16/03/02 16:28:06 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
16/03/02 16:28:07 INFO namenode.TransferFsImage: Transfer took 0.97s at 99329.21 KB/s
16/03/02 16:28:07 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000023586912 size 98560063 bytes.
16/03/02 16:28:07 INFO util.ExitUtil: Exiting with status 0
16/03/02 16:28:07 INFO namenode.NameNode: SHUTDOWN_MSG:
然后正常的启动整个集群就好了.
然后检查下日志输出 . ok 两个 namenode 都起了.
[hadoop@hadoop-8-52 sbin]$ hdfs haadmin -getServiceState nn1
active
[hadoop@hadoop-8-52 sbin]$ hdfs haadmin -getServiceState nn2
standby
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/133735/viewspace-2024696/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/133735/viewspace-2024696/