Hadoop在/share/hadoop/mapreduce目录下有hadoop-mapreduce-examples-2.6.0.jar包,可以进行Wordcount运算。
本实验Hadoop版本为2.6.0,不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同
具体步骤如下:
一、在本地硬盘建立test文件夹,我建立在/home/sky目录下
<span style="font-size:18px;">[root@localhost sky]# cd /home/sky/
[root@localhost sky]# mkdir test
[root@localhost sky]# ls
Desktop eclipse Music Public spark Videos
Documents hadoop mysql pycharm Templates workspace
Downloads hive Pictures scala test
[root@localhost sky]# </span>
二、建立两个需要统计的文档test1.txt和test2.txt
<span style="font-size:18px;">[root@localhost sky]# cd test/
[root@localhost test]# echo "hello world,hello hadoop" > test1.txt
[root@localhost test]# echo "hello world,hello spark" > test2.txt
[root@localhost test]# cat test1.txt
hello world,hello hadoop
</span>
三、在HDFS上创建input目录,并查看
<span style="font-size:18px;">[root@localhost test]# cd /home/sky/hadoop/
[root@localhost hadoop]# bin/hadoop fs -mkdir /input
[root@localhost hadoop]# bin/hadoop fs -ls /
Found 5 items
drwxr-xr-x - root supergroup 0 2015-10-26 19:47 /home
drwxr-xr-x - root supergroup 0 2015-10-29 21:56 /input
drwxr-xr-x - root supergroup 0 2015-10-29 20:54 /output
drwx-wx-wx - root supergroup 0 2015-10-29 20:57 /tmp
drwxr-xr-x - root supergroup 0 2015-10-28 21:32 /user
</span>
[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input
[root@localhost hadoop]# bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt
-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt
五,运行以及运行过程
<span style="font-size:18px;">[root@localhost hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output/wordcount2
15/10/29 22:07:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/29 22:07:46 INFO input.FileInputFormat: Total input paths to process : 2
15/10/29 22:07:46 INFO mapreduce.JobSubmitter: number of splits:2
15/10/29 22:07:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445687376888_0005
15/10/29 22:07:48 INFO impl.YarnClientImpl: Submitted application application_1445687376888_0005
15/10/29 22:07:48 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1445687376888_0005/
15/10/29 22:07:48 INFO mapreduce.Job: Running job: job_1445687376888_0005
15/10/29 22:08:03 INFO mapreduce.Job: Job job_1445687376888_0005 running in uber mode : false
15/10/29 22:08:03 INFO mapreduce.Job: map 0% reduce 0%
15/10/29 22:08:23 INFO mapreduce.Job: map 100% reduce 0%
15/10/29 22:08:34 INFO mapreduce.Job: map 100% reduce 100%
15/10/29 22:08:35 INFO mapreduce.Job: Job job_1445687376888_0005 completed successfully
15/10/29 22:08:35 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=91
FILE: Number of bytes written=316819
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=253
HDFS: Number of bytes written=39
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=34517
Total time spent by all reduces in occupied slots (ms)=8834
Total time spent by all map tasks (ms)=34517
Total time spent by all reduce tasks (ms)=8834
Total vcore-seconds taken by all map tasks=34517
Total vcore-seconds taken by all reduce tasks=8834
Total megabyte-seconds taken by all map tasks=35345408
Total megabyte-seconds taken by all reduce tasks=9046016
Map-Reduce Framework
Map input records=2
Map output records=6
Map output bytes=73
Map output materialized bytes=97
Input split bytes=204
Combine input records=6
Combine output records=6
Reduce input groups=4
Reduce shuffle bytes=97
Reduce input records=6
Reduce output records=4
Spilled Records=12
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=645
CPU time spent (ms)=3820
Physical memory (bytes) snapshot=505044992
Virtual memory (bytes) snapshot=6221623296
Total committed heap usage (bytes)=355999744
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=49
File Output Format Counters
Bytes Written=39
[root@localhost hadoop]#
</span>
六、查看运行结果(注:以空格分词,所以出现“world,hello”)
<span style="font-size:18px;">[root@localhost hadoop]# bin/hdfs dfs -cat /output/wordcount2/*
hadoop 1
hello 2
spark 1
world,hello 2
</span>