MAP运行过程

本文深入解析了MapReduce作业的工作流程,包括任务分配、Map任务执行流程及其各阶段详情,如初始化、执行、溢写和洗牌等。同时,还探讨了如何确定Map任务和Reduce任务的数量。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Anatomy of a MapReduce Job

 

In MapReduce, a YARN application is called a Job. The implementation of the Application Master provided by the MapReduce framework is called MRAppMaster.

Timeline of a MapReduce Job

Timeline MapReduce JobThis is the timeline of a MapReduce Job execution:

  • Map Phase: several Map Tasks are executed
  • Reduce Phase: several Reduce Tasks are executed

Notice that the Reduce Phase may start before the end of Map Phase. Hence, an interleaving between them is possible.

Map Phase

We now focus our discussion on the Map Phase. A key decision is how many MapTasks the Application Master needs to start for the current job.

What does the user give us?

Let’s take a step back. When a client submits an application, several kinds of information are provided to the YARN infrastucture. In particular:

  • a configuration: this may be partial (some parameters are not specified by the user) and in this case the default values are used for the job. Notice that these default values may be the ones chosen by a Hadoop provider like Amanzon.
  • a JAR containing:
    • map() implementation
    • a combiner implementation
    • reduce() implementation
  • input and output information:
    • input directory: is the input directory on HDFS? On S3? How many files?
    • output directory: where will we store the output? On HDFS? On S3?

The number of files inside the input directory is used for deciding the number of Map Tasks of a job.

How many Map Tasks?

The Application Master will launch one MapTask for each map split. Typically, there is a map split for each input file. If the input file is too big (bigger than the HDFS block size) then we have two or more map splits associated to the same input file. This is the pseudocode used inside the method getSplits() of the FileInputFormat class:

num_splits = 0
for each input file f:
   remaining = f.length
   while remaining / split_size > split_slope:
      num_splits += 1
      remaining -= split_size

where:

split_slope = 1.1
split_size =~ dfs.blocksize

Notice that the configuration parameter mapreduce.job.maps is ignored in MRv2 (in the past it was just an hint).

MapTask Launch

The MapReduce Application Master asks to the Resource Manager for Containers needed by the Job: one MapTask container request for each MapTask (map split).

A container request for a MapTask tries to exploit data locality of the map split. The Application Master asks for:

  • a container located on the same Node Manager where the map split is stored (a map split may be stored on multiple nodes due to the HDFS replication factor);
  • otherwise, a container located on a Node Manager in the same rack where the the map split is stored;
  • otherwise, a container on any other Node Manager of the cluster

This is just an hint to the Resource Scheduler. The Resource Scheduler is free to ignore data locality if the suggested assignment is in conflict with the Resouce Scheduler’s goal.

When a Container is assigned to the Application Master, the MapTask is launched.

Map Phase: example of an execution scenario

Map Phase execution

This is a possible execution scenario of the Map Phase:

  • there are two Node Managers: each Node Manager has 2GB of RAM (NM capacity) and each MapTask requires 1GB, we can run in parallel 2 containers on each Node Manager (this is the best scenario, the Resource Scheduler may decide differently)
  • there are no other YARN applications running in the cluster
  • our job has 8 map splits (e.g., there are 7 files inside the input directory, but only one of them is bigger than the HDFS block size so we split it into 2 map splits): we need to run 8 Map Tasks.
Map Task Execution Timeline

Map Task Execution TimelineLet’s now focus on a single Map Task. This is the Map Task execution timeline:

  • INIT phase: we setup the Map Task
  • EXECUTION phase: for each (key, value) tuple inside the map split we run the map() function
  • SPILLING phase: the map output is stored in an in-memory buffer; when this buffer is almost full then we start (in parallel) the spilling phase in order to remove data from it
  • SHUFFLE phase: at the end of the spilling phase, we merge all the map outputs and package them for the reduce phase
MapTask: INIT

During the INIT phase, we:

  1. create a context (TaskAttemptContext.class)
  2. create an instance of the user Mapper.class
  3. setup the input (e.g., InputFormat.classInputSplit.classRecordReader.class)
  4. setup the output (NewOutputCollector.class)
  5. create a mapper context (MapContext.classMapper.Context.class)
  6. initialize the input, e.g.:
  7. create a SplitLineReader.class object
  8. create a HdfsDataInputStream.class object
MapTask: EXECUTION

MapTask execution

The EXECUTION phase is performed by the run method of the Mapper class. The user can override it, but by default it will start by calling the setup method: this function by default does not do anything useful but can be override by the user in order to setup the Task (e.g., initialize class variables). After the setup, for each <key, value> tuple contained in the map split, the map() is invoked. Therefore, map()receives: a key a value, and a mapper context. Using the context, a map stores its output to a buffer.

Notice that the map split is fetched chuck by chunk (e.g., 64KB) and each chunk is split in several (key, value) tuples (e.g., usingSplitLineReader.class). This is done inside the Mapper.Context.nextKeyValue method.

When the map split has been completely processed, the run function calls the clean method: by default, no action is performed but the user may decide to override it.

MapTask: SPILLING

Spilling phase

As seen in the EXECUTING phase, the map will write (using Mapper.Context.write()) its output into a circular in-memory buffer (MapTask.MapOutputBuffer). The size of this buffer is fixed and determined by the configuration parametermapreduce.task.io.sort.mb (default: 100MB).

Whenever this circular buffer is almost full (mapreduce.map. sort.spill.percent: 80% by default), the SPILLING phase is performed (in parallel using a separate thread). Notice that if the splilling thread is too slow and the buffer is 100% full, then the map() cannot be executed and thus it has to wait.

The SPILLING thread performs the following actions:

  1. it creates a SpillRecord and FSOutputStream (local filesystem)
  2. in-memory sorts the used chunk of the buffer: the output tuples are sorted by (partitionIdx, key) using a quicksort algorithm.
  3. the sorted output is split into partitions: one partition for each ReduceTask of the job (see later).
  4. Partitions are sequentially written into the local file.
How Many Reduce Tasks?

The number of ReduceTasks for the job is decided by the configuration parameter mapreduce.job.reduces.

What is the partitionIdx associated to an output tuple?

The paritionIdx of an output tuple is the index of a partition. It is decided inside the Mapper.Context.write():

partitionIdx = (key.hashCode() & Integer.MAX_VALUE) % numReducers

It is stored as metadata in the circular buffer alongside the output tuple. The user can customize the partitioner by setting the configuration parameter mapreduce.job.partitioner.class.

When do we apply the combiner?

If the user specifies a combiner then the SPILLING thread, before writing the tuples to the file (4), executes the combiner on the tuples contained in each partition. Basically, we:

  1. create an instance of the user Reducer.class (the one specified for the combiner!)
  2. create a Reducer.Context: the output will be stored on the local filesystem
  3. execute Reduce.run(): see Reduce Task description

The combiner typically use the same implementation of the standard reduce() function and thus can be seen as a local reducer.

MapTask: end of EXECUTION

At the end of the EXECUTION phase, the SPILLING thread is triggered for the last time. In more detail, we:

  1. sort and spill the remaining unspilled tuples
  2. start the SHUFFLE phase

Notice that for each time the buffer was almost full, we get one spill file (SpillReciord + output file). Each Spill file contains several partitions (segments).

MapTask: SHUFFLE

Reduce Phase

[…]

YARN and MapReduce interaction

YARN and MapReduce interaction

 

 

出处:http://ercoppa.github.io/HadoopInternals/Container.html

在 Golang 中,遍历 `map` 的主要方式是使用 `for...range` 循环。`map` 是一种无序的键值对集合,因此每次遍历的结果顺序可能不一致,这是语言设计上的有意为之,以避免开发者依赖特定顺序[^3]。 ### 基本遍历方式 以下是一个基本的遍历示例: ```go package main import "fmt" func main() { // 创建一个 map myMap := map[string]int{ "apple": 5, "banana": 10, "cherry": 15, } // 使用 for...range 遍历 map for key, value := range myMap { fmt.Printf("Key: %s, Value: %d\n", key, value) } } ``` 在上述代码中,`for key, value := range myMap` 会遍历 `myMap` 中的每个键值对,并将键赋值给 `key`,值赋值给 `value`,然后可以在循环体内进行操作。 ### 遍历键或值 如果只需要键或值,可以只接收一个变量: ```go // 只遍历键 for key := range myMap { fmt.Printf("Key: %s\n", key) } // 只遍历值 for _, value := range myMap { fmt.Printf("Value: %d\n", value) } ``` ### 注意事项 1. **顺序无序性** Golang 中的 `map` 遍历时顺序是不确定的,即使以相同的顺序插入元素,每次遍历的输出顺序也可能不同。这是由于底层实现中 `map` 的结构以及扩容时的重新哈希机制所导致的[^5]。 2. **并发访问问题** 如果在遍历 `map` 的同时,有其他协程对 `map` 进行修改(例如添加或删除键值对),程序可能会触发 panic,错误信息为 `fatal error: concurrent map iteration and map write`。为了避免此类问题,建议在并发环境下使用同步机制(如 `sync.Mutex` 或 `sync.RWMutex`)来保护对 `map` 的访问[^4]。 3. **避免死锁** 在使用锁保护 `map` 的同时进行遍历和修改时,需要注意避免死锁。例如,遍历过程中加锁可能导致修改操作无法执行,进而引发死锁[^4]。 4. **替代方案** 对于需要并发安全的场景,可以考虑使用 `sync.Map` 类型,它是 Go 1.9 引入的并发安全的 `map` 实现。不过需要注意的是,`sync.Map` 没有直接获取 `map` 长度的方法,因此在某些特定需求下可能不适用[^4]。 ### 示例:并发安全的 map 遍历 以下是一个使用 `sync.RWMutex` 实现并发安全遍历的示例: ```go package main import ( "fmt" "sync" ) func main() { var mu sync.RWMutex myMap := map[string]int{ "apple": 5, "banana": 10, "cherry": 15, } // 并发读取 go func() { mu.RLock() defer mu.RUnlock() for key, value := range myMap { fmt.Printf("Key: %s, Value: %d\n", key, value) } }() // 并发写入 go func() { mu.Lock() defer mu.Unlock() myMap["orange"] = 20 }() } ``` 在这个示例中,使用 `sync.RWMutex` 来控制对 `map` 的并发访问,确保读写操作不会发生冲突。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值