Number of Maps and Reduces

最新推荐文章于 2022-07-26 08:21:45 发布

原创最新推荐文章于 2022-07-26 08:21:45 发布 · 258 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #java

Hadoop 专栏收录该内容

34 篇文章

订阅专栏

本文详细解释了在Hadoop中，一个作业的Map任务数量是如何由输入切片的数量决定的，并指出mapred.map.tasks参数只是为InputFormat提供的映射数的提示。同时，介绍了如何通过任务跟踪器控制并行执行的Map任务数量。

The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a hint to the InputFormat for the number of maps.

In your example Hadoop has determined there are 24 input splits and will spawn 24 map jobs in total. But, you can control how many map tasks can be executed in parallel by each of the task tracker.

For more information on the number of map and reduce tasks, please look at the below url

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

References

http://stackoverflow.com/questions/6885441/setting-the-number-of-map-tasks-and-reduce-tasks