Map-Reduce: Shuffle and Sort

最新推荐文章于 2024-11-22 22:59:27 发布

转载最新推荐文章于 2024-11-22 22:59:27 发布 · 759 阅读

本文深入解析了Hadoop MapReduce的工作流程，包括shuffle和sort阶段、分区器与组合器的作用、复制阶段到Reducer以及最终的Reduce阶段。详细阐述了Mapper如何缓冲写入、数据如何分区与排序、Reducer如何接收并处理数据，以及整个过程中的内存管理和磁盘操作。

Introduction

The map phase guarantees that the input to the reducer will be sorted on its key. The process by which output of the mapper is sorted and transferred across to the reducers is known as the shuffle.

The following figure (taken from Hadoop-The Definitive Guide) illustrates the shuffle and sort phase:

Buffering Map Writes

The mapper does not directly write to the disk rather it takes advantage of buffering the writes. Each mapper has a circular memory buffer with a default size of 100MB which can be tweaked by changing the io.sort.mb property. It does the flush in a very smart manner. When the buffer is filled up to a certain threshold (default of 80%- can be changed by tweaking the io.sort.spill.percent property) , a separate thread gets triggered which spills the content in the buffer to the disk.

Role of the Partitioner & Combiner

Before the spill happens to the disk, the thread (which is entrusted with the task of performing the spill) partitions the data according to the reducers it needs to go to and a background thread performs an in-memory sort within the partition based on the key. If a combiner is present, it consumes the output of the in-memory sort. There maybe several spill files which get generated as a part of the above process, hence at the end of the map phase an on-disk merge is performed to form bigger partitions (bigger in size and less in number - number depends on the number of reducers) and the sorting order is taken care of during the merge process.

Copy phase to the Reducer

Now, the output of the several map tasks is sitting on different nodes and it needs to get copied over to the node on which the reducer is going to run in order to consume the output of the map tasks. If the data from the map tasks is able to fit inside the reducer's tasktracker memory, then an in-memory merge is performed of the sorted map output files coming from different nodes. As soon as a threshold is reached, the merged output is written onto the disk and the process repeated till all the map tasks have been accounted for this reducer's partition. Then, an on-disk merge is performed in groups of files and a final group of files is directly feeded into the reducer performing an in-memory merge while feeding (thus saving an extra trip to the disk).

Final Step : The reduce phase

From the final merge (which was a mixture of an in-memory and on-disk merge) , the data is fed to the reduce phase which may optionally perform some further processing and finally the data is written to HDFS.