hadoop: Failed fetch notification Too many fetch-failures

本文记录了一次Hadoop应用中出现的问题:Map阶段运行正常,但Reduce阶段卡在18%,并详细展示了日志信息及排查过程。最终确定为因datanode间无法通信导致的reduce任务失败。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本人是hadoop新手,在一次应用中配置出现了问题:就是hadoop 的map阶段正常,但是reduce却卡在18%哪里,一直要将近一个小时才能完成,查看日志如下:
2011-10-03 09:45:58,330 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201110022127_0003_m_000011
2011-10-03 09:46:01,334 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201110022127_0003_m_000011_0' has completed task_201110022127_0003_m_000011 successfully.
2011-10-03 09:46:01,334 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:9  completedMapsInputSize:437327225  completedMapsOutputSize:193
2011-10-03 09:46:01,737 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201110022127_0003_m_000006_0' has completed task_201110022127_0003_m_000006 successfully.
2011-10-03 09:46:01,737 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:10  completedMapsInputSize:504436090  completedMapsOutputSize:215
2011-10-03 09:46:01,738 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201110022127_0003_m_000007_0' has completed task_201110022127_0003_m_000007 successfully.
2011-10-03 09:46:01,738 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:11  completedMapsInputSize:571544955  completedMapsOutputSize:237
2011-10-03 09:46:04,007 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201110022127_0003_m_000009_0' has completed task_201110022127_0003_m_000009 successfully.
2011-10-03 09:46:04,007 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:12  completedMapsInputSize:593329451  completedMapsOutputSize:258
2011-10-03 09:46:04,008 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201110022127_0003_m_000008_0' has completed task_201110022127_0003_m_000008 successfully.
2011-10-03 09:46:04,008 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:13  completedMapsInputSize:615113993  completedMapsOutputSize:279
2011-10-03 09:46:13,349 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task attempt_201110022127_0003_m_000000_0
2011-10-03 09:48:49,450 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task attempt_201110022127_0003_m_000000_0
2011-10-03 09:53:52,659 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task attempt_201110022127_0003_m_000000_0
2011-10-03 09:53:52,659 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: attempt_201110022127_0003_m_000000_0 ... killing it
2011-10-03 09:53:52,659 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201110022127_0003_m_000000_0: Too many fetch-failures
2011-10-03 09:53:52,661 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201110022127_0003_m_000000_1' to tip task_201110022127_0003_m_000000, for tracker 'tracker_ubuntu-server:127.0.0.1/127.0.0.1:49740'
2011-10-03 09:53:52,661 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201110022127_0003_m_000000
2011-10-03 09:53:53,107 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201110022127_0003_m_000000_0' from 'tracker_fangfei-desktop:127.0.0.1/127.0.0.1:54181'
2011-10-03 09:53:58,264 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201110022127_0003_m_000000_2' to tip task_201110022127_0003_m_000000, for tracker 'tracker_huangzhongyuan-desktop:127.0.0.1/127.0.0.1:48184'
2011-10-03 09:53:58,264 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201110022127_0003_m_000000
2011-10-03 09:54:01,668 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201110022127_0003_m_000000_1' has completed task_201110022127_0003_m_000000 successfully.
2011-10-03 09:54:01,668 INFO org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:14  completedMapsInputSize:682222858  completedMapsOutputSize:301
2011-10-03 09:54:07,282 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201110022127_0003_m_000000_2' to tip task_201110022127_0003_m_000000, for tracker 'tracker_huangzhongyuan-desktop:127.0.0.1/127.0.0.1:48184'
2011-10-03 09:54:10,285 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201110022127_0003_m_000000_2' from 'tracker_huangzhongyuan

 

 

在网上google了很久没找到答案,后来有的说把/etc/hosts中的127.0.0.1的主机映射去掉,试了也不行。知道第二天上午google 的时候才受到一点启发,突然想到,这个问题是发生在reduce阶段,而提示的消息应该是取不到map阶段的结果,既然在Failed fetch notification #1 for task attempt_201110022127_0003_m_000000_0 中有取不到的任务分块的名字,说明namenode正常工作,namenode通知reduce节点进行reduce操作,而它却取不到,只能说明它没法 和那些节点通信,又由于我在配置hadoop的时候用的是主机的名字,不是ip,所以想到解决办法应该是把各个datanode节点的映射互相加到 /etc/hosts中。试了一下,果然正确。所以在此记录。

解决办法 :把各个datanode节点的映射都加到每个datanode节点的/etc/hosts中

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值