MapReduce小试牛刀

文章详细描述了在Hadoop环境中部署并执行WordCountMapReduce任务的过程,包括错误排查、输入输出文件的处理,以及执行日志的解读。最终成功运行示例,生成了_output目录,其中_part-r-00000文件包含了处理结果。

部署完hadoop单机版后,试下mapreduce是怎么分析处理数据的

Word Count

Word Count 就是"词语统计",这是 MapReduce 工作程序中最经典的一种。它的主要任务是对一个文本文件中的词语作归纳统计,统计出每个出现过的词语一共出现的次数。

Hadoop 中包含了许多经典的 MapReduce 示例程序,其中就包含 Word Count.

准备演示文件input.txt

# cat input.txt 
I LOVE GG
I LIKE YY
I LOVE UU
I LIKE RR

复制input.txt至hadoop中

# hdfs dfs -put input.txt /test
# hdfs dfs -ls /test
Found 4 items
drwxr-xr-x   - yunwei supergroup          0 2023-02-24 17:14 /test/a
-rw-r--r--   2 yunwei supergroup         51 2023-02-24 17:25 /test/b.txt
-rw-r--r--   2 yunwei supergroup         51 2023-02-24 17:21 /test/hello-hadoop.txt
-rw-r--r--   2 yunwei supergroup         40 2023-02-27 15:59 /test/input.txt

查看hadoop下的mapreduce包 

# ll $HADOOP_HOME/share/hadoop/mapreduce/
total 4876
-rw-rw-r-- 1 yunwei yunwei  526732 Oct  3  2016 hadoop-mapreduce-client-app-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei  686773 Oct  3  2016 hadoop-mapreduce-client-common-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 1535776 Oct  3  2016 hadoop-mapreduce-client-core-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei  259326 Oct  3  2016 hadoop-mapreduce-client-hs-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei   27489 Oct  3  2016 hadoop-mapreduce-client-hs-plugins-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei   61309 Oct  3  2016 hadoop-mapreduce-client-jobclient-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 1514166 Oct  3  2016 hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
-rw-rw-r-- 1 yunwei yunwei   67762 Oct  3  2016 hadoop-mapreduce-client-shuffle-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei  292710 Oct  3  2016 hadoop-mapreduce-examples-2.6.5.jar
drwxrwxr-x 2 yunwei yunwei    4096 Oct  3  2016 lib
drwxrwxr-x 2 yunwei yunwei      30 Oct  3  2016 lib-examples
drwxrwxr-x 2 yunwei yunwei    4096 Oct  3  2016 sources

vi hadoop-mapreduce-examples-2.6.5.jar

hadoop的命令执行jar

# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar  WordCount  input.txt output      
Unknown program 'WordCount' chosen.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondaryso
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值