部署完hadoop单机版后,试下mapreduce是怎么分析处理数据的
Word Count
Word Count 就是"词语统计",这是 MapReduce 工作程序中最经典的一种。它的主要任务是对一个文本文件中的词语作归纳统计,统计出每个出现过的词语一共出现的次数。
Hadoop 中包含了许多经典的 MapReduce 示例程序,其中就包含 Word Count.
准备演示文件input.txt
# cat input.txt
I LOVE GG
I LIKE YY
I LOVE UU
I LIKE RR
复制input.txt至hadoop中
# hdfs dfs -put input.txt /test
# hdfs dfs -ls /test
Found 4 items
drwxr-xr-x - yunwei supergroup 0 2023-02-24 17:14 /test/a
-rw-r--r-- 2 yunwei supergroup 51 2023-02-24 17:25 /test/b.txt
-rw-r--r-- 2 yunwei supergroup 51 2023-02-24 17:21 /test/hello-hadoop.txt
-rw-r--r-- 2 yunwei supergroup 40 2023-02-27 15:59 /test/input.txt
查看hadoop下的mapreduce包
# ll $HADOOP_HOME/share/hadoop/mapreduce/
total 4876
-rw-rw-r-- 1 yunwei yunwei 526732 Oct 3 2016 hadoop-mapreduce-client-app-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 686773 Oct 3 2016 hadoop-mapreduce-client-common-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 1535776 Oct 3 2016 hadoop-mapreduce-client-core-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 259326 Oct 3 2016 hadoop-mapreduce-client-hs-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 27489 Oct 3 2016 hadoop-mapreduce-client-hs-plugins-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 61309 Oct 3 2016 hadoop-mapreduce-client-jobclient-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 1514166 Oct 3 2016 hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
-rw-rw-r-- 1 yunwei yunwei 67762 Oct 3 2016 hadoop-mapreduce-client-shuffle-2.6.5.jar
-rw-rw-r-- 1 yunwei yunwei 292710 Oct 3 2016 hadoop-mapreduce-examples-2.6.5.jar
drwxrwxr-x 2 yunwei yunwei 4096 Oct 3 2016 lib
drwxrwxr-x 2 yunwei yunwei 30 Oct 3 2016 lib-examples
drwxrwxr-x 2 yunwei yunwei 4096 Oct 3 2016 sources
vi hadoop-mapreduce-examples-2.6.5.jar

hadoop的命令执行jar
# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar WordCount input.txt output
Unknown program 'WordCount' chosen.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondaryso

文章详细描述了在Hadoop环境中部署并执行WordCountMapReduce任务的过程,包括错误排查、输入输出文件的处理,以及执行日志的解读。最终成功运行示例,生成了_output目录,其中_part-r-00000文件包含了处理结果。
最低0.47元/天 解锁文章

631

被折叠的 条评论
为什么被折叠?



