Mapoutputbuffer解读

本文深入解析MapOutputBuffer的工作原理及内部结构,包括环形缓冲区的管理方式、数据溢出处理流程以及排序机制等关键环节。

MapTask中使用的MapoutputBuffer解析


Mapoutputbuffer是一个环形缓冲区,每个输入的Key->value键值对以及其索引信息都会写入到该缓冲区,当缓冲区快满的时候,有一个后台守护线程会负责对数据排序,将其写入到磁盘


成员:
1. kvbuffer: 字节数组,数据和数据的索引都会存在该数组中缓冲区分析
2. kvmeta: 只是kvbuffer中索引存储部分的一个视角,因为索引往往是按照整形存储4字节,所以使用kvmeta来重新组织该部分字节
3. equator: 缓冲区的分割线,用来分割数据和数据的索引信息
4. kvindex: 下次要插入的索引位置
5. kvstart: 溢出时,索引的起始位置
6. kvend: 溢出时,索引的结束位置
7. bufindex: 下次要写入的raw数据的位置
8. bufstart: 溢出时,raw数据的起始位置
9. bufend: 溢出时,raw数据的结束位置
10. spliller: 当数据超过这个比例就溢出
11. sortmb: kvbuffer的总的内存量,默认值100Mb,可以配置
12. indexCacheMemoryLimit: 存放溢出文件信息的缓存大小,默认1M,可以配
13. bufferremaining: buffer 剩余空间,字节为单位
14. softLimit: 溢出阈值,超出后就溢出 sortmb*spller


metadata的组成


valstart: valueoffset  Map输出 <key, value>的value
keystart: key offset   Map输出 <key, value>的key
partition: partition
vallen: value length

写入的

刚开始:
Equotor的pos = 0
raw数据按照顺时针从0往后写
metadata是按照逆时针,从后往前些,因为一个metadata的大小是4个int的大小,所以kvstart=kvend=kvindex=kvbuffer.length - metasize

第一个 <key,value> 的写入

首先放入一个key, 在bufferindex的基础上累加key的字节数,然后放入value,继续累加bufferindex的字节数。
接下来放入metadata, metadata 一共包括4个整数 ,第一个int放 valuestart, 第二个int放keystart, 第三个int放partition,第四个int放value的长度。
为什么只有value的长度,没有key的长度?个人理解key的长度可以通过valuestart-keystart得出,不需要额外的空间来存储key的长度。  
需要注意的是, bufindex和kvindex都发生了变化,分别指向下一个数据需要写入的地方。 但是bufstart,bufend,kvstart,kvend都没有变化, bufferremaining相应减少了metadata 和raw data占据的空间
  
bufend, kvstart应该变化才对???


Mapoutputbuffer实现了IndexSortable 接口, QuickSort 根据 它compare 和 swap方法把metadata的数据根据key的byte数据大小排序。
然后再逐个写入文件中


 

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/Users/Administrator/.m2/repository/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/Administrator/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/Users/Administrator/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2025-11-24 09:27:04,150 WARN [org.apache.hadoop.metrics2.impl.MetricsConfig] - Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties 2025-11-24 09:27:04,189 INFO [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - Scheduled Metric snapshot period at 10 second(s). 2025-11-24 09:27:04,189 INFO [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - JobTracker metrics system started 2025-11-24 09:27:04,386 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2025-11-24 09:27:04,395 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2025-11-24 09:27:04,416 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input files to process : 1 2025-11-24 09:27:04,433 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1 2025-11-24 09:27:04,485 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local2106793956_0001 2025-11-24 09:27:04,486 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Executing with tokens: [] 2025-11-24 09:27:04,554 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/ 2025-11-24 09:27:04,555 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local2106793956_0001 2025-11-24 09:27:04,555 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null 2025-11-24 09:27:04,559 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 2 2025-11-24 09:27:04,559 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2025-11-24 09:27:04,559 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 2025-11-24 09:27:04,589 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks 2025-11-24 09:27:04,589 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local2106793956_0001_m_000000_0 2025-11-24 09:27:04,601 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 2 2025-11-24 09:27:04,601 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2025-11-24 09:27:04,607 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux. 2025-11-24 09:27:04,631 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@1439d592 2025-11-24 09:27:04,634 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/D:/dgh/java/Hadoop/input/subject_score.csv:0+80578 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584) 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600 2025-11-24 09:27:04,673 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600 2025-11-24 09:27:04,675 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 42108; bufvoid = 104857600 2025-11-24 09:27:04,709 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26199088(104796352); length = 15309/6553600 2025-11-24 09:27:04,732 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0 2025-11-24 09:27:04,744 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local2106793956_0001_m_000000_0 is done. And is in the process of committing 2025-11-24 09:27:04,745 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map 2025-11-24 09:27:04,745 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local2106793956_0001_m_000000_0' done. 2025-11-24 09:27:04,749 INFO [org.apache.hadoop.mapred.Task] - Final Counters for attempt_local2106793956_0001_m_000000_0: Counters: 17 File System Counters FILE: Number of bytes read=80748 FILE: Number of bytes written=446408 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=3828 Map output records=3828 Map output bytes=42108 Map output materialized bytes=49770 Input split bytes=113 Combine input records=0 Spilled Records=3828 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=4 Total committed heap usage (bytes)=265814016 File Input Format Counters Bytes Read=80578 2025-11-24 09:27:04,749 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local2106793956_0001_m_000000_0 2025-11-24 09:27:04,749 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete. 2025-11-24 09:27:04,750 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for reduce tasks 2025-11-24 09:27:04,751 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local2106793956_0001_r_000000_0 2025-11-24 09:27:04,754 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 2 2025-11-24 09:27:04,754 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2025-11-24 09:27:04,754 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux. 2025-11-24 09:27:04,776 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@7b215062 2025-11-24 09:27:04,778 INFO [org.apache.hadoop.mapred.ReduceTask] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5bb9ff0f 2025-11-24 09:27:04,779 WARN [org.apache.hadoop.metrics2.impl.MetricsSystemImpl] - JobTracker metrics system already initialized! 2025-11-24 09:27:04,785 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - MergerManager: memoryLimit=1305057664, maxSingleShuffleLimit=326264416, mergeThreshold=861338112, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2025-11-24 09:27:04,786 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - attempt_local2106793956_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 2025-11-24 09:27:04,800 INFO [org.apache.hadoop.mapreduce.task.reduce.LocalFetcher] - localfetcher#1 about to shuffle output of map attempt_local2106793956_0001_m_000000_0 decomp: 49766 len: 49770 to MEMORY 2025-11-24 09:27:04,803 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput] - Read 49766 bytes from map-output for attempt_local2106793956_0001_m_000000_0 2025-11-24 09:27:04,803 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - closeInMemoryFile -> map-output of size: 49766, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->49766 2025-11-24 09:27:04,804 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - EventFetcher is interrupted.. Returning 2025-11-24 09:27:04,804 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied. 2025-11-24 09:27:04,804 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 2025-11-24 09:27:04,811 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments 2025-11-24 09:27:04,811 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 49757 bytes 2025-11-24 09:27:04,817 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merged 1 segments, 49766 bytes to disk to satisfy reduce memory limit 2025-11-24 09:27:04,818 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 1 files, 49770 bytes from disk 2025-11-24 09:27:04,818 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 0 segments, 0 bytes from memory into reduce 2025-11-24 09:27:04,818 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments 2025-11-24 09:27:04,819 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 49757 bytes 2025-11-24 09:27:04,819 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied. 2025-11-24 09:27:04,822 INFO [org.apache.hadoop.conf.Configuration.deprecation] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2025-11-24 09:27:04,835 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local2106793956_0001_r_000000_0 is done. And is in the process of committing 2025-11-24 09:27:04,836 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied. 2025-11-24 09:27:04,836 INFO [org.apache.hadoop.mapred.Task] - Task attempt_local2106793956_0001_r_000000_0 is allowed to commit now 2025-11-24 09:27:04,839 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - Saved output of task 'attempt_local2106793956_0001_r_000000_0' to file:/D:/dgh/java/Hadoop/output 2025-11-24 09:27:04,839 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce 2025-11-24 09:27:04,839 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local2106793956_0001_r_000000_0' done. 2025-11-24 09:27:04,840 INFO [org.apache.hadoop.mapred.Task] - Final Counters for attempt_local2106793956_0001_r_000000_0: Counters: 24 File System Counters FILE: Number of bytes read=180320 FILE: Number of bytes written=496250 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=49770 Reduce input records=3828 Reduce output records=6 Spilled Records=3828 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=0 Total committed heap usage (bytes)=265814016 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Output Format Counters Bytes Written=72 2025-11-24 09:27:04,840 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local2106793956_0001_r_000000_0 2025-11-24 09:27:04,840 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce task executor complete. 2025-11-24 09:27:05,564 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local2106793956_0001 running in uber mode : false 2025-11-24 09:27:05,565 INFO [org.apache.hadoop.mapreduce.Job] - map 100% reduce 100% 2025-11-24 09:27:05,566 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local2106793956_0001 completed successfully 2025-11-24 09:27:05,569 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30 File System Counters FILE: Number of bytes read=261068 FILE: Number of bytes written=942658 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=3828 Map output records=3828 Map output bytes=42108 Map output materialized bytes=49770 Input split bytes=113 Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=49770 Reduce input records=3828 Reduce output records=6 Spilled Records=7656 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=4 Total committed heap usage (bytes)=531628032 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=80578 File Output Format Counters Bytes Written=72 Process finished with exit code 0 运行以后是这样的
最新发布
11-25
2025-06-21 13:57:16,366 INFO client.DefaultNoHARMFailoverProxyProvider: Connect ing to ResourceManager at Masterxd243234025/192.168.40.25:8032 2025-06-21 13:57:16,459 INFO client.DefaultNoHARMFailoverProxyProvider: Connect ing to ResourceManager at Masterxd243234025/192.168.40.25:8032 2025-06-21 13:57:16,698 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your ap plication with ToolRunner to remedy this. 2025-06-21 13:57:16,733 INFO mapreduce.JobResourceUploader: Disabling Erasure C oding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1750408986691_0005 2025-06-21 13:57:16,944 INFO mapred.FileInputFormat: Total input files to proce ss : 1 2025-06-21 13:57:17,002 INFO mapreduce.JobSubmitter: number of splits:2 2025-06-21 13:57:17,133 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1750408986691_0005 2025-06-21 13:57:17,133 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-06-21 13:57:17,890 INFO conf.Configuration: resource-types.xml not found 2025-06-21 13:57:17,890 INFO resource.ResourceUtils: Unable to find 'resource-t ypes.xml'. 2025-06-21 13:57:17,944 INFO impl.YarnClientImpl: Submitted application applica tion_1750408986691_0005 2025-06-21 13:57:17,964 INFO mapreduce.Job: The url to track the job: http://Ma sterXD243234025:8088/proxy/application_1750408986691_0005/ 2025-06-21 13:57:17,965 INFO mapreduce.Job: Running job: job_1750408986691_0005 2025-06-21 13:57:23,092 INFO mapreduce.Job: Job job_1750408986691_0005 running in uber mode : false 2025-06-21 13:57:23,093 INFO mapreduce.Job: map 0% reduce 0% 2025-06-21 13:57:28,292 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000001_0, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:28,346 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000000_0, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:31,396 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000000_1, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:31,399 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000001_1, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:34,474 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000000_2, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:34,477 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000001_2, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:38,534 INFO mapreduce.Job: map 100% reduce 100% 2025-06-21 13:57:38,553 INFO mapreduce.Job: Job job_1750408986691_0005 failed with state FAILED due to: Task failed task_1750408986691_0005_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 2025-06-21 13:57:38,654 INFO mapreduce.Job: Counters: 11 Job Counters Failed map tasks=7 Killed map tasks=1 Killed reduce tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=16733 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=16733 Total vcore-milliseconds taken by all map tasks=16733 Total megabyte-milliseconds taken by all map tasks=17134592 Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:876) at com.xd.WordCount.main(WordCount.java:24) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) 什么错误怎么解决
06-22
### Hadoop MapReduce中`java.util.NoSuchElementException: iterate past last value`错误的解决方案 在Hadoop MapReduce任务中,`java.util.NoSuchElementException: iterate past last value`错误通常是由于迭代器(Iterator)被多次调用`next()`方法导致的。当迭代器已经遍历完所有元素后,再次调用`next()`方法时,会抛出此异常[^4]。 以下是解决该问题的具体方法: #### 1. 确保正确使用迭代器 在MapReduce的`reduce()`方法中,通常通过`context`对象获取键值对的迭代器。例如,在`ReduceContextImpl.ValueIterator`中,`hasNext()`和`next()`方法用于遍历值列表。如果在同一循环中多次调用`next()`,会导致迭代器指针跳过某些元素[^2]。 正确的做法是将每次调用`iterator.next()`的结果存储到一个临时变量中,然后对该变量进行操作。例如: ```java while (iterator.hasNext()) { String value = iterator.next().toString(); set.add(value); list.add(value); } ``` 通过这种方式,可以确保每个值只被迭代一次,并且不会跳过任何元素。 #### 2. 避免重复创建迭代器 在某些情况下,开发人员可能在同一段代码中多次调用`context.nextKeyValue()`或`context.nextKey()`。这会导致新的迭代器被创建,从而破坏原有的遍历逻辑[^3]。 因此,应确保在`reduce()`方法中仅调用一次`context.nextKeyValue()`,并将结果存储在局部变量中以供后续使用。 #### 3. 检查输入数据格式 如果输入数据格式不正确,可能会导致迭代器在处理过程中遇到意外情况。例如,如果某个键对应的值列表为空,调用`iterator.next()`时会直接抛出异常。 在编写代码时,建议在遍历前检查迭代器是否包含有效值: ```java if (!iterator.hasNext()) { return; } ``` #### 4. 调试与日志记录 为了更好地定位问题,可以在关键位置添加日志记录,输出当前迭代器的状态和值。例如: ```java while (iterator.hasNext()) { String value = iterator.next().toString(); System.out.println("Processing value: " + value); set.add(value); list.add(value); } ``` 通过日志信息,可以更清楚地了解迭代器的行为以及异常发生的原因。 --- ### 示例代码 以下是一个完整的`reduce()`方法实现示例,展示了如何正确使用迭代器避免`NoSuchElementException`: ```java public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { int sum = 0; for (Text value : values) { // 使用增强for循环替代手动迭代器 sum += Integer.parseInt(value.toString()); } context.write(key, new IntWritable(sum)); } ``` 在此示例中,使用了增强型`for`循环来遍历`values`,从而避免了手动管理迭代器带来的潜在问题。 --- ### 总结 `java.util.NoSuchElementException: iterate past last value`错误的根本原因是迭代器被不当使用。通过确保每次调用`next()`的结果被正确存储、避免重复创建迭代器以及检查输入数据格式,可以有效避免此类问题。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值