Hadoop 运行 Wordcount程序

最新推荐文章于 2021-01-27 00:01:19 发布

BIT_SKY

最新推荐文章于 2021-01-27 00:01:19 发布

阅读量807

点赞数 1

CC 4.0 BY-SA版权

分类专栏： Hadoop

本文链接：https://blog.youkuaiyun.com/BIT_SKY/article/details/49497511

Hadoop 专栏收录该内容

2 篇文章

订阅专栏

本文介绍了如何在Hadoop 2.6.0环境下执行Wordcount程序。首先，在本地创建test文件夹并放入待统计的test1.txt和test2.txt文档。然后，使用Hadoop的fs -put命令将文件上传到/input目录，并通过fs -ls确认文件已上传。最后，执行Wordcount程序进行单词统计。

Hadoop在/share/hadoop/mapreduce目录下有hadoop-mapreduce-examples-2.6.0.jar包，可以进行Wordcount运算。

本实验Hadoop版本为2.6.0，不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同

具体步骤如下：

一、在本地硬盘建立test文件夹，我建立在/home/sky目录下

<span style="font-size:18px;">[root@localhost sky]# cd /home/sky/
[root@localhost sky]# mkdir test
[root@localhost sky]# ls
Desktop    eclipse  Music     Public   spark      Videos
Documents  hadoop   mysql     pycharm  Templates  workspace
Downloads  hive     Pictures  scala    test
[root@localhost sky]# </span>

二、建立两个需要统计的文档test1.txt和test2.txt

<span style="font-size:18px;">[root@localhost sky]# cd test/
[root@localhost test]# echo "hello world,hello hadoop" > test1.txt
[root@localhost test]# echo "hello world,hello spark"  > test2.txt
[root@localhost test]# cat test1.txt 
hello world,hello hadoop
</span>

三、在HDFS上创建input目录，并查看

<span style="font-size:18px;">[root@localhost test]# cd /home/sky/hadoop/
[root@localhost hadoop]# bin/hadoop fs -mkdir /input
[root@localhost hadoop]# bin/hadoop fs -ls /
Found 5 items
drwxr-xr-x   - root supergroup          0 2015-10-26 19:47 /home
drwxr-xr-x   - root supergroup          0 2015-10-29 21:56 /input
drwxr-xr-x   - root supergroup          0 2015-10-29 20:54 /output
drwx-wx-wx   - root supergroup          0 2015-10-29 20:57 /tmp
drwxr-xr-x   - root supergroup          0 2015-10-28 21:32 /user
</span>

四、把本地磁盘上的test文件传到HDFS文件新建的/input目录下

[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input
[root@localhost hadoop]# bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt
-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt

五，运行以及运行过程

<span style="font-size:18px;">[root@localhost hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output/wordcount2
15/10/29 22:07:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/29 22:07:46 INFO input.FileInputFormat: Total input paths to process : 2
15/10/29 22:07:46 INFO mapreduce.JobSubmitter: number of splits:2
15/10/29 22:07:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445687376888_0005
15/10/29 22:07:48 INFO impl.YarnClientImpl: Submitted application application_1445687376888_0005
15/10/29 22:07:48 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1445687376888_0005/
15/10/29 22:07:48 INFO mapreduce.Job: Running job: job_1445687376888_0005
15/10/29 22:08:03 INFO mapreduce.Job: Job job_1445687376888_0005 running in uber mode : false
15/10/29 22:08:03 INFO mapreduce.Job:  map 0% reduce 0%
15/10/29 22:08:23 INFO mapreduce.Job:  map 100% reduce 0%
15/10/29 22:08:34 INFO mapreduce.Job:  map 100% reduce 100%
15/10/29 22:08:35 INFO mapreduce.Job: Job job_1445687376888_0005 completed successfully
15/10/29 22:08:35 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=91
		FILE: Number of bytes written=316819
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=253
		HDFS: Number of bytes written=39
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=34517
		Total time spent by all reduces in occupied slots (ms)=8834
		Total time spent by all map tasks (ms)=34517
		Total time spent by all reduce tasks (ms)=8834
		Total vcore-seconds taken by all map tasks=34517
		Total vcore-seconds taken by all reduce tasks=8834
		Total megabyte-seconds taken by all map tasks=35345408
		Total megabyte-seconds taken by all reduce tasks=9046016
	Map-Reduce Framework
		Map input records=2
		Map output records=6
		Map output bytes=73
		Map output materialized bytes=97
		Input split bytes=204
		Combine input records=6
		Combine output records=6
		Reduce input groups=4
		Reduce shuffle bytes=97
		Reduce input records=6
		Reduce output records=4
		Spilled Records=12
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=645
		CPU time spent (ms)=3820
		Physical memory (bytes) snapshot=505044992
		Virtual memory (bytes) snapshot=6221623296
		Total committed heap usage (bytes)=355999744
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=49
	File Output Format Counters 
		Bytes Written=39
[root@localhost hadoop]# 
</span>

六、查看运行结果（注：以空格分词，所以出现“world,hello”）

<span style="font-size:18px;">[root@localhost hadoop]# bin/hdfs dfs -cat /output/wordcount2/*
hadoop	1
hello	2
spark	1
world,hello	2
</span>