Hadoop remote debugging

当Hadoop任务失败且日志不足以诊断错误时,本文介绍了几种有效的调试方法,包括本地重现失败、使用JVM调试选项、使用任务剖析、Isolation Runner以及保留失败任务的中间文件供后续检查。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

When a task fails and there is not enough information logged to diagnose the error, you may want to resort to running a debugger for that task. This is hard to arrange when running the job on a cluster, as you don’t know which node is going to process which part of the input, so you can’t set up your debugger ahead of the failure. However, there are a few other options available:

 

 

  • Reproduce the failure locally

Often the failing task fails consistently on a particular input. You can try to repro- duce the problem     locally by downloading the file that the task is failing on and running the job locally, possibly using a debugger such as Java’s VisualVM.

 

 

  • Use JVM debugging options

A common cause of failure is a Java out of memory error in the task JVM. You can setmapred.child.java.optstoinclude-XX:-HeapDumpOnOutOfMemoryError -XX:Heap DumpPath=/path/to/dumps. This setting produces a heap dump that can be examined afterward with tools such as jhat or the Eclipse Memory Analyzer. Note that the JVM options should be added to the existing memory settings specified by mapred.child.java.opts

 

 

  • Use task profiling

Java profilers give a lot of insight into the JVM, and Hadoop provides a mechanism to profile a subset of the tasks in a job. 

 

 

  • Use IsolationRunner

 

Older versions of Hadoop provided a special task runner called IsolationRunner that could rerun failed tasks in situ on the cluster. Unfortunately, it is no longer available in recent versions, but you can track its replacement at https://issues .apache.org/jira/browse/MAPREDUCE-2637.

 

In some cases it’s useful to keep the intermediate files for a failed task attempt for later inspection, particularly if supplementary dump or profile files are created in the task’s working directory. You can set keep.failed.task.files to true to keep a failed task’s files.

 

You can keep the intermediate files for successful tasks, too, which may be handy if you want to examine a task that isn’t failing. In this case, set the property keep.task.files.pattern to a regular expression that matches the IDs of the tasks you want to keep.

 

To examine the intermediate files, log into the node that the task failed on and look for the directory for that task attempt. It will be under one of the local MapReduce direc- tories, as set by the mapred.local.dir property. If this property is a comma-separated list of directories (to spread load across the physical disks on a machine), you may need to look in all of the directories before you find the directory for that particular task attempt. The task attempt directory is in the following location:

          mapred.local.dir/taskTracker/jobcache/job-ID/task-attempt-ID

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值