个人Hadoop 错误列表

本文深入解析了Hadoop Reduce任务中遇到的常见故障,特别是fetch失败次数过多导致的任务终止问题。文章详细阐述了fetch操作的原理、失败阈值的计算方法以及达到阈值后的任务处理流程。同时,提供了针对Task attempt超时问题的解决方案,包括调整配置参数以延长超时时间,并指导如何检查和修复潜在的代码错误。最后,介绍了Hadoop中将被标记为黑名单的TaskTracker的情况及解决办法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

错误1:Too many fetch-failures

 

Reduce task 启动后第一个阶段是 shuffle ,即向 mapfetch 数据。每次 fetch 都可能因为 connect 超时, read 超时, checksum 错误等原因而失败。 Reduce task 为每个 map 设置了一个计数器,用以记录 fetchmap 输出时失败的次数。当失败次数达到一定阈值时,会通知 JobTracker fetchmap 输出操作失败次数太多了,并打印如下 log

Failed to fetch map-output from attempt_201105261254_102769_m_001802_0 even after MAX_FETCH_RETRIES_PER_MAP retries... reporting to the JobTracker

其中阈值计算方式为:

max ( MIN_FETCH_RETRIES_PER_MAP ,

getClosestPowerOf2 (( this . maxBackoff * 1000 / BACKOFF_INIT ) + 1));

默认情况下 MIN_FETCH_RETRIES_PER_MAP=2 maxBackoff=300 BACKOFF_INIT=4000 因此默认阈值为 6 ,可通过修改 mapred.reduce.copy.backoff 参数来调整。

当达到阈值后, Reduce task 通过 umbilical 协议告诉 TaskTrackerTaskTracker 在下一次 heartbeat 时,通知 JobTracker 。当 JobTracker 发现超过 50%Reduce 汇报 fetch 某个 map 的输出多次失败后, JobTrackerfailed 掉该 map 并重新调度,打印如下 log

"Too many fetch-failures for output of task: attempt_201105261254_102769_m_001802_0 ... killing it"

 

 

错误2:Task attempt failed to report status for 622 seconds. Killing

The description for mapred.task.timeout which defaults to 600s says "The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. "

Increasing the value of mapred.task.timeout might solve the problem, but you need to figure out if more than 600s is actually required for the map task to complete processing the input data or if there is a bug in the code which needs to be debugged.

According to the Hadoop best practices, on average a map task should take a minute or so to process an InputSplit.

 

错误3:Hadoop: Blacklisted tasktracker

 

Put following config in conf/hdfs-site.xml:

<property>
  <name>dfs.hosts</name>
  <value>/full/path/to/whitelisted/node/file</value>
</property>

Use following command to ask Hadoop to refresh node status to based on configuration.

./bin/hadoop dfsadmin -refreshNodes

 

http://serverfault.com/questions/288440/hadoop-blacklisted-tasktracker

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值