Hadoop第一个wordcount程序

本文介绍了如何在Hadoop集群上运行WordCount程序,包括创建测试文件、上传至HDFS、执行WordCount任务并查看结果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Hadoop第一个wordcount程序


我们来运行hadoop-example.jar 里面自带的WordCount 程序,作用是统计单词的个数。

1)在Ubuntu1 的Hadoop 的home 目录下创建一个test.txt 文件,内容如下。
Hello world
Hello world
Hello world
Hello world
2)在HDFS 系统里创建一个input 文件夹,使用命令如下。
$ hadoop fs –mkdir /user/hadoop/input
3)把创建好的test.txt 文件上传到HDFS 系统的input 文件夹下,命令如下。
$ hadoop fs –put  /opt/hadoop-0.20.2/test.txt  /user/hadoop/input/   (其中/opt/hadoop-0.20.1是你安装hadoop路径)

4)查看文件是否上传成功,结果如下图1 所示。


                                      图1 上传文件查询
5)运行hadoop-1.0.3-examples.jar 下的单词统计案例,执行命令如下。
$ cd /opt/hadoop-1.0.3
$ hadoop jar hadoop-examples-1.0.3.jar wordcount /user/hadoop/input/test.txt
/user/hadoop/output
13/04/20 00:47:07 INFO input.FileInputFormat: Total input paths to process : 1
13/04/20 00:47:07 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/20 00:47:07 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/20 00:47:08 INFO mapred.JobClient: Running job: job_201304200039_0001
13/04/20 00:47:09 INFO mapred.JobClient: map 0% reduce 0%
13/04/20 00:47:45 INFO mapred.JobClient: map 100% reduce 0%
13/04/20 00:48:08 INFO mapred.JobClient: map 100% reduce 100%
13/04/20 00:48:13 INFO mapred.JobClient: Job complete: job_201304200039_0001
13/04/20 00:48:13 INFO mapred.JobClient: Counters: 29
13/04/20 00:48:13 INFO mapred.JobClient: Job Counters
13/04/20 00:48:13 INFO mapred.JobClient: Launched reduce tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=28822
13/04/20 00:48:13 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/04/20 00:48:13 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/20 00:48:13 INFO mapred.JobClient: Launched map tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: Data-local map tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18236
13/04/20 00:48:13 INFO mapred.JobClient: File Output Format Counters
13/04/20 00:48:13 INFO mapred.JobClient: Bytes Written=16
13/04/20 00:48:13 INFO mapred.JobClient: FileSystemCounters
13/04/20 00:48:13 INFO mapred.JobClient: FILE_BYTES_READ=30
13/04/20 00:48:13 INFO mapred.JobClient: HDFS_BYTES_READ=159
13/04/20 00:48:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43053
13/04/20 00:48:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=16
13/04/20 00:48:13 INFO mapred.JobClient: File Input Format Counters
13/04/20 00:48:13 INFO mapred.JobClient: Bytes Read=48
13/04/20 00:48:13 INFO mapred.JobClient: Map-Reduce Framework
13/04/20 00:48:13 INFO mapred.JobClient: Map output materialized bytes=30
13/04/20 00:48:13 INFO mapred.JobClient: Map input records=4
13/04/20 00:48:13 INFO mapred.JobClient: Reduce shuffl e bytes=30
13/04/20 00:48:13 INFO mapred.JobClient: Spilled Records=4
13/04/20 00:48:13 INFO mapred.JobClient: Map output bytes=80
13/04/20 00:48:13 INFO mapred.JobClient: CPU time spent (ms)=2870
13/04/20 00:48:13 INFO mapred.JobClient: Total committed heap usage
(bytes)=210698240
13/04/20 00:48:13 INFO mapred.JobClient: Combine input records=8
13/04/20 00:48:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=111
13/04/20 00:48:13 INFO mapred.JobClient: Reduce input records=2
13/04/20 00:48:13 INFO mapred.JobClient: Reduce input groups=2
13/04/20 00:48:13 INFO mapred.JobClient: Combine output records=2
13/04/20 00:48:13 INFO mapred.JobClient: Physical memory (bytes)
snapshot= 180101120
13/04/20 00:48:13 INFO mapred.JobClient: Reduce output records=2
13/04/20 00:48:13 INFO mapred.JobClient: Virtual memory (bytes)
snapshot= 749068288
13/04/20 00:48:13 INFO mapred.JobClient: Map output records=8

6)查看运行结果如下图2所示。


                                        图2 WordCount 结果
  OK !到这里Hadoop 三个节点的集群就安装结束并且测试成功了。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值