测试Hadoop,执行mapreduce测试程序

1. 定位到测试jar包位置

进入Hadoop目录下的share,找到hadoop-mapreduce-examples-2.10.1.jar测试包

# 定位目录
/root/dong/program/hadoop-2.10.1/share/hadoop/mapreduce

# 查看目录  找到hadoop-mapreduce-examples-2.10.1.jar测试包
root@hecs-x-large-2-linux-20200618145835:~/dong/program/hadoop-2.10.1/share/hadoop/mapreduce# ll
total 5256
drwxr-xr-x 6 1000 qa    4096 Sep 14 21:39 ./
drwxr-xr-x 9 1000 qa    4096 Sep 14 21:39 ../
-rw-r--r-- 1 1000 qa  586815 Sep 14 21:39 hadoop-mapreduce-client-app-2.10.1.jar
-rw-r--r-- 1 1000 qa  787989 Sep 14 21:39 hadoop-mapreduce-client-common-2.10.1.jar
-rw-r--r-- 1 1000 qa 1613911 Sep 14 21:39 hadoop-mapreduce-client-core-2.10.1.jar
-rw-r--r-- 1 1000 qa  199675 Sep 14 21:39 hadoop-mapreduce-client-hs-2.10.1.jar
-rw-r--r-- 1 1000 qa   32779 Sep 14 21:39 hadoop-mapreduce-client-hs-plugins-2.10.1.jar
-rw-r--r-- 1 1000 qa   72212 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1.jar
-rw-r--r-- 1 1000 qa 1652223 Sep 14 21:39 hadoop-mapreduce-client-jobclient-2.10.1-tests.jar
-rw-r--r-- 1 1000 qa   84008 Sep 14 21:39 hadoop-mapreduce-client-shuffle-2.10.1.jar
-rw-r--r-- 1 1000 qa  303324 Sep 14 21:39 hadoop-mapreduce-examples-2.10.1.jar
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 jdiff/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 lib/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 lib-examples/
drwxr-xr-x 2 1000 qa    4096 Sep 14 21:39 sources/

2. 运行测试包

# 执行jar包 pi为主类  3 为map任务数量  3为map取样数
# hadoop jar hadoop-mapreduce-examples-2.10.1.jar pi 3 3
Number of Maps  = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001
21/01/16 11:03:47 INFO conf.Configuration: resource-types.xml not found
21/01/16 11:03:47 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/01/16 11:03:47 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job:  map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job:  map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job:  map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully
21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=72
                FILE: Number of bytes written=835625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=792
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=17266
                Total time spent by all reduces in occupied slots (ms)=8882
                Total time spent by all map tasks (ms)=17266
                Total time spent by all reduce tasks (ms)=8882
                Total vcore-milliseconds taken by all map tasks=17266
                Total vcore-milliseconds taken by all reduce tasks=8882
                Total megabyte-milliseconds taken by all map tasks=17680384
                Total megabyte-milliseconds taken by all reduce tasks=9095168
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=54
                Map output materialized bytes=84
                Input split bytes=438
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=84
                Reduce input records=6
                Reduce output records=0
                Spilled Records=12
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=486
                CPU time spent (ms)=2020
                Physical memory (bytes) snapshot=1026621440
                Virtual memory (bytes) snapshot=7678922752
                Total committed heap usage (bytes)=701497344
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=354
        File Output Format Counters 
                Bytes Written=97
Job Finished in 31.276 seconds
Estimated value of Pi is 3.5555555555555555555

3. 发现

得到Hadoop能干什么,先执行一个正常的Hadoop mr例子,从中发现什么?

1. 发现 map任务数可自定义

命令指定的任务数3,和抽样数3
发现可以根据需求,自定义指定map数量

Number of Maps  = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2

2. 发现 提交任务后处理过程

任务启动后,
第一步:代理客户端连接ResourceManager
第二步:FileInputFormat指定由三个input files进程
第三步:JobSubmitter 提交后有三个split
第四步:创建job序号,并提交job_1610510670587_0001

Starting Job
21/01/16 11:03:45 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
21/01/16 11:03:46 INFO input.FileInputFormat: Total input files to process : 3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: number of splits:3
21/01/16 11:03:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1610510670587_0001

3.发现 任务执行流程

任务执行流程
第一步:yarn客户端进行提交应用
第二步:mapreduce根据url进行处理应用
第三步:mr运行job_1610510670587_0001
第四步:发现map和reduce处于传递执行(一方处理完后,传递到下一方),没有同时执行任务

21/01/16 11:03:47 INFO impl.YarnClientImpl: Submitted application application_1610510670587_0001
21/01/16 11:03:47 INFO mapreduce.Job: The url to track the job: http://localhost.vm:8088/proxy/application_1610510670587_0001/
21/01/16 11:03:47 INFO mapreduce.Job: Running job: job_1610510670587_0001
21/01/16 11:03:55 INFO mapreduce.Job: Job job_1610510670587_0001 running in uber mode : false
21/01/16 11:03:55 INFO mapreduce.Job:  map 0% reduce 0%
21/01/16 11:04:03 INFO mapreduce.Job:  map 100% reduce 0%
21/01/16 11:04:15 INFO mapreduce.Job:  map 100% reduce 100%
21/01/16 11:04:16 INFO mapreduce.Job: Job job_1610510670587_0001 completed successfully

4. 发现 整个任务从开始到结束有哪些组件参与

1. file system

2. job

3. Map-Reduce

4. Shuffle

5. File Input

6. File output

21/01/16 11:04:17 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=72
                FILE: Number of bytes written=835625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=792
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=17266
                Total time spent by all reduces in occupied slots (ms)=8882
                Total time spent by all map tasks (ms)=17266
                Total time spent by all reduce tasks (ms)=8882
                Total vcore-milliseconds taken by all map tasks=17266
                Total vcore-milliseconds taken by all reduce tasks=8882
                Total megabyte-milliseconds taken by all map tasks=17680384
                Total megabyte-milliseconds taken by all reduce tasks=9095168
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=54
                Map output materialized bytes=84
                Input split bytes=438
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=84
                Reduce input records=6
                Reduce output records=0
                Spilled Records=12
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=486
                CPU time spent (ms)=2020
                Physical memory (bytes) snapshot=1026621440
                Virtual memory (bytes) snapshot=7678922752
                Total committed heap usage (bytes)=701497344
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=354
        File Output Format Counters 
                Bytes Written=97
Job Finished in 31.276 seconds
<think>好的,我现在需要帮助用户了解如何在Hadoop中使用MapReduce运行WordCount示例。用户已经提供了一些引用资料,我需要结合这些资料和我的知识来组织回答。 首先,我需要回忆Hadoop运行WordCount的基本步骤。通常步骤包括准备输入文件、上传到HDFS、运行MapReduce作业、查看输出。用户提供的引用[1]提到了在Windows下运行的具体命令,比如hadoop jar路径和输入输出目录。引用[2]则提到MapReduce的框架原理,可能涉及序列化和压缩,但用户的问题主要是运行方法,所以可能不需要深入这些细节。 接下来,我需要确保步骤清晰,结构正确。用户可能需要详细的操作步骤,包括环境配置、命令示例和注意事项。例如,在Windows环境下运行可能会有路径的问题,需要注意斜杠的方向和HDFS目录的存在。同时,用户可能需要知道如何查看结果,引用[1]中提到了查看运行结果的部分,应该包含进去。 还需要检查用户提供的引用是否适用。引用[1]中的命令是hadoop jar C:/MapReduceClient.jar wordcount /user /output_dir,这里输入路径是/user,输出是/output_dir。可能需要提醒用户确保输入路径下确实有文件存在,否则作业会失败。另外,输出目录必须不存在,否则Hadoop会报错,这也是需要注意的地方。 另外,用户可能遇到的常见问题包括环境变量配置错误、HDFS权限问题、JAR包路径不正确等。在回答中可以适当加入这些注意事项,帮助用户避免踩坑。例如,在步骤2.5中,用户需要正确指定JAR包的路径,可能因安装位置不同而有所变化。 生成相关问题部分,需要考虑用户可能进一步想了解的内容,比如参数调优、日志查看、错误排查、原理深入等。结合引用[2]中的内容,可以涉及框架原理和序列化的问题,但用户当前的问题主要是运行方法,所以相关问题应围绕操作和基础原理展开。 最后,确保回答符合系统级指令:行内公式正确使用$...$,虽然这个问题可能不涉及数学公式,但如果有变量或路径,可能需要用代码块或引号标注。独立公式这里可能不需要。使用中文回答,结构清晰,分步骤说明,并添加引用标识。检查LaTeX语法是否正确,这里可能不需要。生成的相关问题需要与主题相关,并且有实际意义。</think>### Hadoop MapReduce WordCount 示例运行教程 #### 1. 环境准备 - 确保Hadoop集群已启动(单机或分布式模式)[^1]。 - 准备输入文本文件(如`input.txt`),并上传至HDFS: ```bash hadoop fs -mkdir /user/input hadoop fs -put input.txt /user/input ``` #### 2. 执行MapReduce作业 - 使用`hadoop jar`命令运行预编译的WordCount程序(以引用[1]为例): ```bash hadoop jar C:/MapReduceClient.jar wordcount /user/input /user/output_dir ``` - **参数说明**: - `C:/MapReduceClient.jar`: 包含WordCount类的JAR文件路径 - `/user/input`: HDFS中的输入目录 - `/user/output_dir`: HDFS输出目录(**必须不存在**) #### 3. 查看结果 - 输出结果会写入HDFS的`/user/output_dir/part-r-00000`: ```bash hadoop fs -cat /user/output_dir/part-r-00000 ``` 结果示例: ``` Hello 2 World 1 Hadoop 3 ``` #### 注意事项 1. **路径格式**:Windows下路径使用正斜杠(`/`),Linux/Mac无需调整。 2. **输出目录冲突**:若输出目录已存在,需先删除:`hadoop fs -rm -r /user/output_dir`。 3. **日志调试**:可通过YARN Web界面(`http://localhost:8088`)查看作业日志。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值