Number of Maps and Reduces

本文详细解释了在Hadoop中,一个作业的Map任务数量是如何由输入切片的数量决定的,并指出mapred.map.tasks参数只是为InputFormat提供的映射数的提示。同时,介绍了如何通过任务跟踪器控制并行执行的Map任务数量。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a hint to the InputFormat for the number of maps.

 

 

In your example Hadoop has determined there are 24 input splits and will spawn 24 map jobs in total. But, you can control how many map tasks can be executed in parallel by each of the task tracker.

 

For more information on the number of map and reduce tasks, please look at the below url

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

 

 

 

References

http://stackoverflow.com/questions/6885441/setting-the-number-of-map-tasks-and-reduce-tasks

[root@hadoop01 jars]# hadoop jar film.jar CleanDriver /film/input /film/outputs/cleandata 25/03/29 14:28:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.20.20:8032 25/03/29 14:28:19 INFO input.FileInputFormat: Total input paths to process : 1 25/03/29 14:28:19 INFO mapreduce.JobSubmitter: number of splits:1 25/03/29 14:28:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1743079842141_0006 25/03/29 14:28:20 INFO impl.YarnClientImpl: Submitted application application_1743079842141_0006 25/03/29 14:28:20 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1743079842141_0006/ 25/03/29 14:28:20 INFO mapreduce.Job: Running job: job_1743079842141_0006 25/03/29 14:28:26 INFO mapreduce.Job: Job job_1743079842141_0006 running in uber mode : false 25/03/29 14:28:26 INFO mapreduce.Job: map 0% reduce 0% 25/03/29 14:28:31 INFO mapreduce.Job: map 100% reduce 0% 25/03/29 14:28:36 INFO mapreduce.Job: map 100% reduce 100% 25/03/29 14:28:36 INFO mapreduce.Job: Job job_1743079842141_0006 completed successfully 25/03/29 14:28:36 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=6 FILE: Number of bytes written=245465 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=8669 HDFS: Number of bytes written=0 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2452 Total time spent by all reduces in occupied slots (ms)=2374 Total time spent by all map tasks (ms)=2452 Total time spent by all reduce tasks (ms)=2374 Total vcore-milliseconds taken by all map tasks=2452 Total vcore-milliseconds taken by all reduce tasks=2374 Total megabyte-milliseconds taken by all map tasks=251084
03-30
[root@hadoop01 jars]# hadoop jar film.jar CleanDriver /film/input /film/outputs/cleandata 25/04/02 17:25:34 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.20.20:8032 25/04/02 17:25:35 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 25/04/02 17:25:35 INFO input.FileInputFormat: Total input paths to process : 2 25/04/02 17:25:35 INFO mapreduce.JobSubmitter: number of splits:2 25/04/02 17:25:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1743583799083_0005 25/04/02 17:25:35 INFO impl.YarnClientImpl: Submitted application application_1743583799083_0005 25/04/02 17:25:35 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1743583799083_0005/ 25/04/02 17:25:35 INFO mapreduce.Job: Running job: job_1743583799083_0005 25/04/02 17:25:41 INFO mapreduce.Job: Job job_1743583799083_0005 running in uber mode : false 25/04/02 17:25:41 INFO mapreduce.Job: map 0% reduce 0% 25/04/02 17:25:49 INFO mapreduce.Job: map 100% reduce 0% 25/04/02 17:25:55 INFO mapreduce.Job: map 100% reduce 17% 25/04/02 17:26:01 INFO mapreduce.Job: map 100% reduce 33% 25/04/02 17:26:04 INFO mapreduce.Job: map 100% reduce 50% 25/04/02 17:26:05 INFO mapreduce.Job: map 100% reduce 67% 25/04/02 17:26:06 INFO mapreduce.Job: map 100% reduce 100% 25/04/02 17:26:06 INFO mapreduce.Job: Job job_1743583799083_0005 completed successfully 25/04/02 17:26:06 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=36 FILE: Number of bytes written=982026 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=17347 HDFS: Number of bytes written=0 HDFS: Number of read operations=24 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Killed reduce tasks=1 Launched map tasks=2 Launched reduce tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=9878 Total time spent by all reduces in occupied slots (ms)=55847 Total time spent by all map tasks (ms)=9878 Total time spent by all reduce tasks (ms)=55847 Total vcore-milliseconds taken by all map tasks=9878 Total vcore-milliseconds taken by all reduce tasks=55847 Total megabyte-milliseconds taken by all map tasks=10115072 Total megabyte-milliseconds taken by all reduce tasks=57187328 Map-Reduce Framework Map input records=144 Map output records=0 Map output bytes=0 Map output materialized bytes=72 Input split bytes=219 Combine input records=0 Combine output records=0 Reduce input groups=0 Reduce shuffle bytes=72 Reduce input records=0 Reduce output records=0 Spilled Records=0 Shuffled Maps =12 Failed Shuffles=0 Merged Map outputs=12 GC time elapsed (ms)=535 CPU time spent (ms)=2820 Physical memory (bytes) snapshot=1039224832 Virtual memory (bytes) snapshot=16491360256 Total committed heap usage (bytes)=453517312 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=17128 File Output Format Counters Bytes Written=0
最新发布
04-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值