The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a hint to the InputFormat for the number of maps.
In your example Hadoop has determined there are 24 input splits and will spawn 24 map jobs in total. But, you can control how many map tasks can be executed in parallel by each of the task tracker.
For more information on the number of map and reduce tasks, please look at the below url
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
References
http://stackoverflow.com/questions/6885441/setting-the-number-of-map-tasks-and-reduce-tasks
本文详细解释了在Hadoop中,一个作业的Map任务数量是如何由输入切片的数量决定的,并指出mapred.map.tasks参数只是为InputFormat提供的映射数的提示。同时,介绍了如何通过任务跟踪器控制并行执行的Map任务数量。
1127

被折叠的 条评论
为什么被折叠?



