MapReduce小试牛刀

原创

已于 2023-02-27 16:29:01 修改 · 939 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#mapreduce #hadoop #大数据

于 2023-02-27 16:22:48 首次发布

文章详细描述了在Hadoop环境中部署并执行WordCountMapReduce任务的过程，包括错误排查、输入输出文件的处理，以及执行日志的解读。最终成功运行示例，生成了_output目录，其中_part-r-00000文件包含了处理结果。

部署完hadoop单机版后，试下mapreduce是怎么分析处理数据的

Word Count

Word Count 就是"词语统计"，这是 MapReduce 工作程序中最经典的一种。它的主要任务是对一个文本文件中的词语作归纳统计，统计出每个出现过的词语一共出现的次数。

Hadoop 中包含了许多经典的 MapReduce 示例程序，其中就包含 Word Count.

准备演示文件input.txt

# cat input.txt 
I LOVE GG
I LIKE YY
I LOVE UU
I LIKE RR

复制input.txt至hadoop中

# hdfs dfs -put input.txt /test
# hdfs dfs -ls /test
Found 4 items
drwxr-xr-x   - yunwei supergroup          0 2023-02-24 17:14 /test/a
-rw-r--r--   2 yunwei supergroup         51 2023-02-24 17:25 /test/b.txt
-rw-r--r--   2 yunwei supergroup         51 2023-02-24 17:21 /test/hello-hadoop.txt
-rw-r--r--   2 yunwei supergroup         40 2023-02-27 15:59 /test/input.txt

查看hadoop下的mapreduce包

# ll $HADOOP_HOME/share/hadoop/mapreduce/
total 4876
-rw-rw-r-- 1 yunwei yunwei  526732 Oct  3  2016 hadoop-mapreduce-client-app-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei  686773 Oct  3  2016 hadoop-mapreduce-client-common-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 1535776 Oct  3  2016 hadoop-mapreduce-client-core-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei  259326 Oct  3  2016 hadoop-mapreduce-client-hs-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei   27489 Oct  3  2016 hadoop-mapreduce-client-hs-plugins-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei   61309 Oct  3  2016 hadoop-mapreduce-client-jobclient-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 1514166 Oct  3  2016 hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
-rw-rw-r-- 1 yunwei yunwei   67762 Oct  3  2016 hadoop-mapreduce-client-shuffle-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei  292710 Oct  3  2016 hadoop-mapreduce-examples-2.6.5.jar
drwxrwxr-x 2 yunwei yunwei    4096 Oct  3  2016 lib
drwxrwxr-x 2 yunwei yunwei      30 Oct  3  2016 lib-examples
drwxrwxr-x 2 yunwei yunwei    4096 Oct  3  2016 sources

vi hadoop-mapreduce-examples-2.6.5.jar

hadoop的命令执行jar

# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar  WordCount  input.txt output      
Unknown program 'WordCount' chosen.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondaryso