Hadoop remote debugging

最新推荐文章于 2024-01-12 20:12:47 发布

puffsun

最新推荐文章于 2024-01-12 20:12:47 发布

阅读量77

点赞数

CC 4.0 BY-SA版权

分类专栏： Hadoop 文章标签：大数据 java jira

本文链接：https://blog.youkuaiyun.com/puffsun/article/details/84447131

Hadoop 专栏收录该内容

57 篇文章

订阅专栏

当Hadoop任务失败且日志不足以诊断错误时，本文介绍了几种有效的调试方法，包括本地重现失败、使用JVM调试选项、使用任务剖析、Isolation Runner以及保留失败任务的中间文件供后续检查。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

When a task fails and there is not enough information logged to diagnose the error, you may want to resort to running a debugger for that task. This is hard to arrange when running the job on a cluster, as you don’t know which node is going to process which part of the input, so you can’t set up your debugger ahead of the failure. However, there are a few other options available:

Reproduce the failure locally

Often the failing task fails consistently on a particular input. You can try to repro- duce the problem locally by downloading the file that the task is failing on and running the job locally, possibly using a debugger such as Java’s VisualVM.

Use JVM debugging options

A common cause of failure is a Java out of memory error in the task JVM. You can setmapred.child.java.optstoinclude-XX:-HeapDumpOnOutOfMemoryError -XX:Heap DumpPath=/path/to/dumps. This setting produces a heap dump that can be examined afterward with tools such as jhat or the Eclipse Memory Analyzer. Note that the JVM options should be added to the existing memory settings specified by mapred.child.java.opts.

Use task profiling

Java profilers give a lot of insight into the JVM, and Hadoop provides a mechanism to profile a subset of the tasks in a job.

Use IsolationRunner

Older versions of Hadoop provided a special task runner called IsolationRunner that could rerun failed tasks in situ on the cluster. Unfortunately, it is no longer available in recent versions, but you can track its replacement at https://issues .apache.org/jira/browse/MAPREDUCE-2637.

In some cases it’s useful to keep the intermediate files for a failed task attempt for later inspection, particularly if supplementary dump or profile files are created in the task’s working directory. You can set keep.failed.task.files to true to keep a failed task’s files.

You can keep the intermediate files for successful tasks, too, which may be handy if you want to examine a task that isn’t failing. In this case, set the property keep.task.files.pattern to a regular expression that matches the IDs of the tasks you want to keep.

To examine the intermediate files, log into the node that the task failed on and look for the directory for that task attempt. It will be under one of the local MapReduce direc- tories, as set by the mapred.local.dir property. If this property is a comma-separated list of directories (to spread load across the physical disks on a machine), you may need to look in all of the directories before you find the directory for that particular task attempt. The task attempt directory is in the following location:

mapred.local.dir/taskTracker/jobcache/job-ID/task-attempt-ID