datanode Bad connect ack with firstBadLink

Hadoop Job启动慢及Datnode异常排查

 1、每次启动job很慢并有异常信息:

ERROR - java.io.IOException: Bad connect ack with firstBadLink as 10.21.232.114:50010
23-08-2016 14:13:21 CST import_ucord01_order_discount ERROR -   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1401)
23-08-2016 14:13:21 CST import_ucord01_order_discount ERROR -   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1302)
23-08-2016 14:13:21 CST import_ucord01_order_discount ERROR -   at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:536)
java.net.SocketTimeoutException: 70000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.21.193.201:60450 remote=/10.21.232.116:50010]
23-08-2016 14:18:09 CST import_ucord01_order_discount ERROR -   at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)


2、排除了防火墙和网络带宽的可能,登陆10.201.232.114查看日志频繁刷如下信息:


2016-08-23 14:13:04,862 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1905901049-10.21.232.111-1430129177585:blk_1310291686_237019930 src: /10.201.129.163:43292 dest: /10.201.232.114:50010
2016-08-23 14:13:04,862 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 4786ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=5032ms
2016-08-23 14:13:04,863 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: cdh14.idc1.fn:50010:DataXceiver error processing unknown operation  src: /10.21.194.130:58540 dst: /10.21.232.114:50010
java.io.EOFException
        at java.io.DataInputStream.readShort(DataInputStream.java:315)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:56)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:202)
        at java.lang.Thread.run(Thread.java:745)




3、有日志看出,频繁的fullgc导致datanode进程停止服务

2025-09-18 04:59:25,118 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741825_1001 java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.88.103:9866 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:128) at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:104) at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1921) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1822) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:751) 2025-09-18 04:59:25,123 WARN hdfs.DataStreamer: Abandoning BP-1974876939-192.168.88.101-1758071567220:blk_1073741825_1001 2025-09-18 04:59:25,134 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.88.103:9866,DS-0b8b77f5-a0b1-4abd-ab70-f1077efa388f,DISK] 2025-09-18 04:59:25,181 WARN hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741826_1002 java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.88.102:9866 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:128) at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:104) at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1921) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1822) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:751) 2025-09-18 04:59:25,182 WARN hdfs.DataStreamer: Abandoning BP-1974876939-192.168.88.101-1758071567220:blk_1073741826_1002 2025-09-18 04:59:25,198 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.88.102:9866,DS-578bea83-9e25-4c1a-aa2b-2be244f85f34,DISK]
最新发布
09-19
在Hadoop HDFS中,DataStreamer创建块输出流时出现 `IOException` 错误(`Got error, status=ERROR`),从提供的引用中可推测可能与网络连接有关。若报错 `Unable to connect to remote host: No route to host` 或者执行原程序时观察hadoop产生的日志出现 `java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as ip:端口`,可参考 `Unable to connect to remote host: No route to host` 的解决方法来处理 [^4]。 不过,仅从现有引用无法明确具体的解决方案内容。通常来说,针对这类问题,可从以下几个常见方向排查解决: 1. **网络连通性**:检查 `firstBadLink` 中给出的 `ip:端口` 是否可达,可通过 `ping` 命令检查IP连通性,使用 `telnet` 命令检查端口是否开放。 2. **数据节点状态**:确保对应的DataNode节点正常运行,可查看DataNode的日志文件,排查是否有异常信息。 3. **配置检查**:检查Hadoop集群的配置文件,如 `hdfs-site.xml` 等,确认相关配置是否正确。 ```python # 示例代码,检查网络连通性 import subprocess def check_network_connectivity(ip): try: result = subprocess.run(['ping', '-c', '3', ip], stdout=subprocess.PIPE, stderr=subprocess.PIPE) if result.returncode == 0: print(f"{ip} 网络连通") else: print(f"{ip} 网络不连通") except Exception as e: print(f"检查网络连通性时出错: {e}") # 示例调用 check_network_connectivity('120.78.239.136') ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值