测试Hadoop，执行mapreduce测试程序

最新推荐文章于 2024-05-12 21:25:50 发布

原创最新推荐文章于 2024-05-12 21:25:50 发布 · 2.1k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#hadoop #大数据 #mapreduce #java

大数据专栏收录该内容

2 篇文章

订阅专栏

文章目录

1. 定位到测试jar包位置
2. 运行测试包
3. 发现

1. 定位到测试jar包位置

进入Hadoop目录下的share，找到hadoop-mapreduce-examples-2.10.1.jar测试包

# 定位目录
/root/dong/program/hadoop-2.10.1/share/hadoop/mapreduce

# 查看目录  找到hadoop-mapreduce-examples-2.10.1.jar测试包
root@hecs-x-large-2-linux-20200618145835:~/dong/program/hadoop-2.10.1/share/hadoop/mapreduce# ll
total 5256
drwxr-xr-x 6 1000 qa    4096 Sep 14 21:39 ./
drwxr-xr-x 9 1000 qa    4096 Sep 14 21:39 ../
-rw-r--r-- 1 1000 qa  586815 Sep 14 21:39 hadoop-mapreduce-client-app-2.10.1.jar
-rw-r--r-- 1 1000 qa  787989 Sep 14 21:39 hadoop-mapreduce-client-common-2.10.1.jar
-rw-r--r-- 1 1000 qa 1613911 Sep 14 21:39 hadoop-mapreduce-client-core-2.10.1.jar
-rw-r--r-- 1 1000 qa  199675 Sep 14 21:39 hadoop-mapreduce-client-hs-2.10.1.jar
-rw-r--r-- 1 1000 qa   32779 Sep 14 21:39 hadoop-mapreduce-client-hs-plugins-2.10.1.jar
-rw-r--r-- 1 1000 qa   72212 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1.jar
-rw-r--r-- 1 1000 qa 1652223 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1-tests.jar
-rw-r--r-- 1 1000 qa   84008 Sep 14 21:39 hadoop-mapreduce-client-shuffle-2.10.1.jar
-rw-r--r-- 1 1000 qa  303324 Sep 14 21:39 hadoop-mapreduce-examples-2.10.1.jar
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 jdiff/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 lib/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 lib-examples/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 sources/

2. 运行测试包

# 执行jar包 pi为主类  3 为map任务数量  3为map取样数
# hadoop jar hadoop-mapreduce-examples-2.10.1.jar pi 3 3
Number of Maps  = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001
21/01/16 11:03:47 INFO conf.Configuration: resource-types.xml not found
21/01/16 11:03:47 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job:  map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job:  map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job:  map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully
21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=72
                FILE: Number of bytes written=835625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=792
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=17266
                Total time spent by all reduces in occupied slots (ms)=8882
                Total time spent by all map tasks (ms)=17266
                Total time spent by all reduce tasks (ms)=8882
                Total vcore-milliseconds taken by all map tasks=17266
                Total vcore-milliseconds taken by all reduce tasks=8882
                Total megabyte-milliseconds taken by all map tasks=17680384
                Total megabyte-milliseconds taken by all reduce tasks=9095168
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=54
                Map output materialized bytes=84
                Input split bytes=438
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=84
                Reduce input records=6
                Reduce output records=0
                Spilled Records=12
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=486
                CPU time spent (ms)=2020
                Physical memory (bytes) snapshot=1026621440
                Virtual memory (bytes) snapshot=7678922752
                Total committed heap usage (bytes)=701497344
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=354
        File Output Format Counters 
                Bytes Written=97
Job Finished in 31.276 seconds
Estimated value of Pi is 3.5555555555555555555

3. 发现

得到Hadoop能干什么，先执行一个正常的Hadoop mr例子，从中发现什么？

1. 发现 map任务数可自定义

命令指定的任务数3，和抽样数3
发现可以根据需求，自定义指定map数量

Number of Maps  = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2

2. 发现提交任务后处理过程

任务启动后，
第一步：代理客户端连接ResourceManager
第二步：FileInputFormat指定由三个input files进程
第三步：JobSubmitter 提交后有三个split
第四步：创建job序号，并提交job_1610510670587_0001

Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001

3.发现任务执行流程

任务执行流程
第一步：yarn客户端进行提交应用
第二步：mapreduce根据url进行处理应用
第三步：mr运行job_1610510670587_0001
第四步：发现map和reduce处于传递执行（一方处理完后，传递到下一方），没有同时执行任务

21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job:  map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job:  map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job:  map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully

4. 发现整个任务从开始到结束有哪些组件参与

1. file system

2. job

3. Map-Reduce

4. Shuffle

5. File Input

6. File output

21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=72
                FILE: Number of bytes written=835625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=792
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=17266
                Total time spent by all reduces in occupied slots (ms)=8882
                Total time spent by all map tasks (ms)=17266
                Total time spent by all reduce tasks (ms)=8882
                Total vcore-milliseconds taken by all map tasks=17266
                Total vcore-milliseconds taken by all reduce tasks=8882
                Total megabyte-milliseconds taken by all map tasks=17680384
                Total megabyte-milliseconds taken by all reduce tasks=9095168
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=54
                Map output materialized bytes=84
                Input split bytes=438
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=84
                Reduce input records=6
                Reduce output records=0
                Spilled Records=12
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=486
                CPU time spent (ms)=2020
                Physical memory (bytes) snapshot=1026621440
                Virtual memory (bytes) snapshot=7678922752
                Total committed heap usage (bytes)=701497344
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=354
        File Output Format Counters 
                Bytes Written=97
Job Finished in 31.276 seconds