第十一篇:Map/Reduce 工作机制分析 - 错误处理机制

本文介绍了Hadoop集群中常见的故障类型及其处理方式。主要包括JobTracker和TaskTracker节点损坏的情况,并详细解析了心跳通信机制如何确保任务正常运行。

前言

       对于Hadoop集群来说,节点损坏是非常常见的现象。

       而Hadoop一个很大的特点就是某个节点的损坏,不会影响到整个分布式任务的运行。

       下面就来分析Hadoop平台是如何做到的。

硬件故障

       硬件故障可以分为两种 - JobTracker节点损坏和TaskTracker节点损坏。

1. JobTracker节点损坏

       这是Hadoop集群中最为严重的错误。

       出现了这种错误,那就只能重新选择JobTracker节点,而在选择期,所有的任务都必须停掉,而且当前已经完成了的任务也必须通通重来。

2. TaskTracker节点损坏

       这是Hadoop集群中最常见的错误。对于这类错误,Hadoop有完好的错误处理机制。

       JobTracker和TaskTracker的心跳通信机制要求TaskTracker保证在1分钟之内向JobTracker汇报进展。

       如果超过时间JobTracker没有收到汇报,就会将该TaskTracker从等待调度的集合中移除出去;

       而如果收到任务失败的的报告,就把这个TaskTracker移动到等待调度队列尾部重新排队。但是若一个TaskTracker连续汇报了四次失败,那么也会被移出任务等待队列。

小结

       关于故障的处理维护,一般会由专人来进行管理。

       这部分内容就暂且不做深究了。

INSERT OVERWRITE TABLE ads_brand_stats > SELECT > brand, > SUM(order_cnt) as order_cnt, > ROUND(SUM(total_amount), 2) as total_amount, > ROUND(AVG(avg_amount), 2) as avg_amount, > ROW_NUMBER() OVER(ORDER BY SUM(total_amount) DESC) as brand_rank, > ROUND(SUM(total_amount) * 100.0 / (SELECT SUM(total_amount) FROM dws_brand_day WHERE dt BETWEEN '2025-10-11' AND '2025-11-11'), 2) as market_share > FROM dws_brand_day > WHERE dt BETWEEN '2025-10-11' AND '2025-11-11' > GROUP BY brand; Warning: Map Join MAPJOIN[51][bigTable=?] in task 'Stage-7:MAPRED' is a cross product Warning: Shuffle Join JOIN[21][tables = [$hdt$_0, $hdt$_1]] in Stage 'Stage-3:MAPRED' is a cross product Query ID = root_20251031162530_727241c0-a107-4028-b0bf-d9d0efcb4c69 Total jobs = 6 Launching Job 1 out of 6 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1761898939137_0001, Tracking URL = http://server01:8088/proxy/application_1761898939137_0001/ Kill Command = /opt/server/hadoop-3.2.2/bin/mapred job -kill job_1761898939137_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-10-31 16:25:46,696 Stage-1 map = 0%, reduce = 0% 2025-10-31 16:26:47,007 Stage-1 map = 0%, reduce = 0% 2025-10-31 16:27:47,433 Stage-1 map = 0%, reduce = 0% 2025-10-31 16:28:48,290 Stage-1 map = 0%, reduce = 0% 2025-10-31 16:29:49,115 Stage-1 map = 0%, reduce = 0% 2025-10-31 16:30:49,929 Stage-1 map = 0%, reduce = 0% Interrupting... Be patient, this might take some time. Press Ctrl+C again to kill JVM killing job with: job_1761898939137_0001 2025-10-31 16:30:56,003 Stage-1 map = 0%, reduce = 100% Exiting the JVM killing job with: job_1761898939137_0001 2025-10-31 16:31:06,276 Stage-1 map = 0%, reduce = 0% Ended Job = job_1761898939137_0001 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec
11-01
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/Users/Administrator/.m2/repository/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/Administrator/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/Administrator/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2025-11-24 09:27:04,150 WARN [org.apache.hadoop.metrics2.impl.MetricsConfig] - Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties 2025-11-24 09:27:04,189 INFO [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - Scheduled Metric snapshot period at 10 second(s). 2025-11-24 09:27:04,189 INFO [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - JobTracker metrics system started 2025-11-24 09:27:04,386 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2025-11-24 09:27:04,395 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2025-11-24 09:27:04,416 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input files to process : 1 2025-11-24 09:27:04,433 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1 2025-11-24 09:27:04,485 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local2106793956_0001 2025-11-24 09:27:04,486 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Executing with tokens: [] 2025-11-24 09:27:04,554 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/ 2025-11-24 09:27:04,555 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local2106793956_0001 2025-11-24 09:27:04,555 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null 2025-11-24 09:27:04,559 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 2 2025-11-24 09:27:04,559 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2025-11-24 09:27:04,559 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 2025-11-24 09:27:04,589 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks 2025-11-24 09:27:04,589 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local2106793956_0001_m_000000_0 2025-11-24 09:27:04,601 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 2 2025-11-24 09:27:04,601 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2025-11-24 09:27:04,607 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux. 2025-11-24 09:27:04,631 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@1439d592 2025-11-24 09:27:04,634 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/D:/dgh/java/Hadoop/input/subject_score.csv:0+80578 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584) 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600 2025-11-24 09:27:04,675 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 42108; bufvoid = 104857600 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26199088(104796352); length = 15309/6553600 2025-11-24 09:27:04,732 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0 2025-11-24 09:27:04,744 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local2106793956_0001_m_000000_0 is done. And is in the process of committing 2025-11-24 09:27:04,745 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map 2025-11-24 09:27:04,745 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local2106793956_0001_m_000000_0' done. 2025-11-24 09:27:04,749 INFO [org.apache.hadoop.mapred.Task] - Final Counters for attempt_local2106793956_0001_m_000000_0: Counters: 17 File System Counters FILE: Number of bytes read=80748 FILE: Number of bytes written=446408 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=3828 Map output records=3828 Map output bytes=42108 Map output materialized bytes=49770 Input split bytes=113 Combine input records=0 Spilled Records=3828 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=4 Total committed heap usage (bytes)=265814016 File Input Format Counters Bytes Read=80578 2025-11-24 09:27:04,749 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local2106793956_0001_m_000000_0 2025-11-24 09:27:04,749 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete. 2025-11-24 09:27:04,750 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for reduce tasks 2025-11-24 09:27:04,751 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local2106793956_0001_r_000000_0 2025-11-24 09:27:04,754 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 2 2025-11-24 09:27:04,754 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2025-11-24 09:27:04,754 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux. 2025-11-24 09:27:04,776 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@7b215062 2025-11-24 09:27:04,778 INFO [org.apache.hadoop.mapred.ReduceTask] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5bb9ff0f 2025-11-24 09:27:04,779 WARN [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - JobTracker metrics system already initialized! 2025-11-24 09:27:04,785 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - MergerManager: memoryLimit=1305057664, maxSingleShuffleLimit=326264416, mergeThreshold=861338112, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2025-11-24 09:27:04,786 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - attempt_local2106793956_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 2025-11-24 09:27:04,800 INFO [org.apache.hadoop.mapreduce.task.reduce.LocalFetcher] - localfetcher#1 about to shuffle output of map attempt_local2106793956_0001_m_000000_0 decomp: 49766 len: 49770 to MEMORY 2025-11-24 09:27:04,803 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput] - Read 49766 bytes from map-output for attempt_local2106793956_0001_m_000000_0 2025-11-24 09:27:04,803 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - closeInMemoryFile -> map-output of size: 49766, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->49766 2025-11-24 09:27:04,804 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - EventFetcher is interrupted.. Returning 2025-11-24 09:27:04,804 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied. 2025-11-24 09:27:04,804 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 2025-11-24 09:27:04,811 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments 2025-11-24 09:27:04,811 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 49757 bytes 2025-11-24 09:27:04,817 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merged 1 segments, 49766 bytes to disk to satisfy reduce memory limit 2025-11-24 09:27:04,818 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 1 files, 49770 bytes from disk 2025-11-24 09:27:04,818 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 0 segments, 0 bytes from memory into reduce 2025-11-24 09:27:04,818 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments 2025-11-24 09:27:04,819 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 49757 bytes 2025-11-24 09:27:04,819 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied. 2025-11-24 09:27:04,822 INFO [org.apache.hadoop.conf.Configuration.deprecation] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2025-11-24 09:27:04,835 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local2106793956_0001_r_000000_0 is done. And is in the process of committing 2025-11-24 09:27:04,836 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied. 2025-11-24 09:27:04,836 INFO [org.apache.hadoop.mapred.Task] - Task attempt_local2106793956_0001_r_000000_0 is allowed to commit now 2025-11-24 09:27:04,839 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - Saved output of task 'attempt_local2106793956_0001_r_000000_0' to file:/D:/dgh/java/Hadoop/output 2025-11-24 09:27:04,839 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce 2025-11-24 09:27:04,839 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local2106793956_0001_r_000000_0' done. 2025-11-24 09:27:04,840 INFO [org.apache.hadoop.mapred.Task] - Final Counters for attempt_local2106793956_0001_r_000000_0: Counters: 24 File System Counters FILE: Number of bytes read=180320 FILE: Number of bytes written=496250 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=49770 Reduce input records=3828 Reduce output records=6 Spilled Records=3828 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=0 Total committed heap usage (bytes)=265814016 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Output Format Counters Bytes Written=72 2025-11-24 09:27:04,840 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local2106793956_0001_r_000000_0 2025-11-24 09:27:04,840 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce task executor complete. 2025-11-24 09:27:05,564 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local2106793956_0001 running in uber mode : false 2025-11-24 09:27:05,565 INFO [org.apache.hadoop.mapreduce.Job] - map 100% reduce 100% 2025-11-24 09:27:05,566 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local2106793956_0001 completed successfully 2025-11-24 09:27:05,569 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30 File System Counters FILE: Number of bytes read=261068 FILE: Number of bytes written=942658 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=3828 Map output records=3828 Map output bytes=42108 Map output materialized bytes=49770 Input split bytes=113 Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=49770 Reduce input records=3828 Reduce output records=6 Spilled Records=7656 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=4 Total committed heap usage (bytes)=531628032 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=80578 File Output Format Counters Bytes Written=72 Process finished with exit code 0 运行以后是这样的
11-25
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值