运行Hadoop WordCount

最新推荐文章于 2025-09-08 10:07:54 发布

weixin_34348174

最新推荐文章于 2025-09-08 10:07:54 发布

阅读量66

点赞数

CC 4.0 BY-SA版权

文章标签：大数据 java python

原文链接：https://my.oschina.net/himrliu/blog/832425

本文介绍了如何在Hadoop环境中运行WordCount程序，包括启动Hadoop集群、创建测试文件、上传文件到HDFS、执行WordCount任务及查看结果等步骤。

2019独角兽企业重金招聘Python工程师标准>>>

运行Hadoop WordCount

1.启动Hadoop

./root/hadoop/hadoop-2.6.0/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
或者使用:
./root/hadoop/hadoop-2.6.0/sbin/start-dfs.sh
./root/hadoop/hadoop-2.6.0/sbin/start-yarn.sh

2.准备测试文件,在某个目录创建测试文件

[root@localhost /]# mkdir /root/testFile
[root@localhost /]# echo "Hello Hadoop" > /root/testFile/hello.txt
[root@localhost /]# echo "Hello Java" > /root/testFile/hello2.txt

3.在HDFS上创建输入文件夹目录 input

/root/hadoop/hadoop-2.6.0/bin
[root@localhost bin]# hadoop fs -mkdir /input

把本地硬盘上创建的文件传进input里面

[root@localhost bin]# hadoop fs -put /root/testFile/hello*.txt /input

hadoop自带的wordcount jar包位置 WordCount类代码

/root/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar

开始运行 wordcount

[root@localhost bin]# hadoop jar /root/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input/ /output/wordcount1
17/02/05 19:48:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/02/05 19:48:39 INFO input.FileInputFormat: Total input paths to process : 2
17/02/05 19:48:39 INFO mapreduce.JobSubmitter: number of splits:2
17/02/05 19:48:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1486108015974_0001
17/02/05 19:48:43 INFO impl.YarnClientImpl: Submitted application application_1486108015974_0001
17/02/05 19:48:44 INFO mapreduce.Job: The url to track the job: http://localhost:8099/proxy/application_1486108015974_0001/
17/02/05 19:48:44 INFO mapreduce.Job: Running job: job_1486108015974_0001
17/02/05 19:49:20 INFO mapreduce.Job: Job job_1486108015974_0001 running in uber mode : false
17/02/05 19:49:20 INFO mapreduce.Job:  map 0% reduce 0%
17/02/05 19:49:47 INFO mapreduce.Job:  map 50% reduce 0%
17/02/05 19:49:49 INFO mapreduce.Job:  map 100% reduce 0%
17/02/05 19:49:58 INFO mapreduce.Job:  map 100% reduce 100%
17/02/05 19:49:59 INFO mapreduce.Job: Job job_1486108015974_0001 completed successfully
17/02/05 19:49:59 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=54
		FILE: Number of bytes written=316700
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=229
		HDFS: Number of bytes written=24
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=52251
		Total time spent by all reduces in occupied slots (ms)=6032
		Total time spent by all map tasks (ms)=52251
		Total time spent by all reduce tasks (ms)=6032
		Total vcore-seconds taken by all map tasks=52251
		Total vcore-seconds taken by all reduce tasks=6032
		Total megabyte-seconds taken by all map tasks=53505024
		Total megabyte-seconds taken by all reduce tasks=6176768
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=40
		Map output materialized bytes=60
		Input split bytes=205
		Combine input records=4
		Combine output records=4
		Reduce input groups=3
		Reduce shuffle bytes=60
		Reduce input records=4
		Reduce output records=3
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=679
		CPU time spent (ms)=9280
		Physical memory (bytes) snapshot=707444736
		Virtual memory (bytes) snapshot=2677784576
		Total committed heap usage (bytes)=516423680
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=24
	File Output Format Counters 
		Bytes Written=24
[root@localhost bin]#

查看运行结果

[root@localhost bin]# hdfs dfs -cat /output/wordcount1/*
Hadoop	1
Hello	2
Java	1

参考:http://www.itnose.net/detail/6197823.html

转载于:https://my.oschina.net/himrliu/blog/832425