MapReduce简介
MapReduce思想在生活中处处可见。或多或少都曾接触过这种思想。MapReduce的思想核心是“分而治之”,适用于大规模数据处理场景。
MapReduce 案例 WordCount
需求: 在一堆给定的文本文件中统计输出每一个单词出现的总次数。
1、
source code源代码复制粘贴在一个java文件里,再用文件传输上传到虚拟机。源代码位置--
这里重新创建了一个文件夹wc00来存放WordCount.java文件
2、将 Hadoop 的 classhpath 信息添加到 CLASSPATH 变量中
命令:vi /etc/profile
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPAH
命令:source /etc/profile
[root@demo wc00]# vi /etc/profile
[root@demo wc00]# source /etc/profile
[root@demo wc00]# echo $CLASSPATH
/opt/hadoop-3.1.4/etc/hadoop:/opt/hadoop-3.1.4/share/hadoop/common/lib/*:/opt/hadoop-3.1.4/share/hadoop/common/*:/opt/hadoop-3.1.4/share/hadoop/hdfs:/opt/hadoop-3.1.4/share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.4/share/hadoop/hdfs/*:/opt/hadoop-3.1.4/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.4/share/hadoop/mapreduce/*:/opt/hadoop-3.1.4/share/hadoop/yarn:/opt/hadoop-3.1.4/share/hadoop/yarn/lib/*:/opt/hadoop-3.1.4/share/hadoop/yarn/*:
3、编译、打包 Hadoop MapReduce程序:
命令:javac WordCount.java
命令:jar -cvf WordCount.jar ./WordCount*.class
4、配置yarn-site.xml:
命令:vi /opt/hadoop-3.1.4/etc/hadoop/yarn-site.xml
<property>
<name>yarn.application.classpath</name>
<value>
/opt/hadoop-3.1.4/etc/hadoop:/opt/hadoop-3.1.4/share/hadoop/common/lib/*:/opt/hadoop-3.1.4/share/hadoop/common/*:/opt/hadoop-3.1.4/share/hadoop/hdfs:/opt/hadoop-3.1.4/share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.4/share/hadoop/hdfs/*:/opt/hadoop-3.1.4/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.4/share/hadoop/mapreduce/*:/opt/hadoop-3.1.4/share/hadoop/yarn:/opt/hadoop-3.1.4/share/hadoop/yarn/lib/*:/opt/hadoop-3.1.4/share/hadoop/yarn/*
</value>
</property>
查看hadoop 的classpath变量
[root@demo sbin]# cd /opt/hadoop-3.1.4
[root@demo hadoop-3.1.4]# hadoop classpath
/opt/hadoop-3.1.4/etc/hadoop:/opt/hadoop-3.1.4/share/hadoop/common/lib/*:/opt/hadoop-3.1.4/share/hadoop/common/*:/opt/hadoop-3.1.4/share/hadoop/hdfs:/opt/hadoop-3.1.4/share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.4/share/hadoop/hdfs/*:/opt/hadoop-3.1.4/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.4/share/hadoop/mapreduce/*:/opt/hadoop-3.1.4/share/hadoop/yarn:/opt/hadoop-3.1.4/share/hadoop/yarn/lib/*:/opt/hadoop-3.1.4/share/hadoop/yarn/*配置yarn-site.xml
[root@demo hadoop-3.1.4]# cd etc/hadoop
[root@demo hadoop]# vi yarn-site.xml
5、启动集群&上传文件到集群:
cd到sbin文件夹 (/opt/hadoop-3.1.4/sbin),注意启动顺序:
命令:./start-dfs.sh
命令: ./start-yarn.sh
命令:./mr-jobhistory-daemon.sh start historyserver 或者 mapred --daemon start historyse
rver
在开始创建的wc00文件当中创建一个txt文件,存放需要查找的文本文件。
创建并添加一下word在wordfile.txt文件在中。
命令:vi wordfile.txt
hello world hadoop
hive sqoop flume hello
kitty tom jerry world
hadoop
...
[root@demo hadoop-3.1.4]# cd sbin
[root@demo sbin]# ./start-dfs.sh
[root@demo sbin]# ./start-yarn.sh
[root@demo sbin]# ./mr-jobhistory-daemon.sh start historyserver
[root@demo sbin]# hdfs dfs -mkdir /input
[root@demo sbin]# cd
[root@demo ~]# hdfs dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2022-11-02 11:26 /input
drwxrwx--- - root supergroup 0 2022-10-19 16:12 /tmp
[root@demo ~]# cd ~/wc00
[root@demo wc00]# ls
WordCount.class WordCount.jar WordCount$TokenizerMapper.class
WordCount$IntSumReducer.class WordCount.java wordfile.txt上传wordfile.txt到input
命令: hdfs dfs -put wordfile.txt /input
mhttp://192.168.199.150:50070/dfshealth.html#tab-overview
命令:hadoop jar WordCount.jar WordCount /input /results
统计单词字数成功!
在master上操作,cd到sbin文件夹 (/opt/hadoop-3.1.4/sbin),注意关闭顺序:
[root@master sbin]# ./stop-dfs.sh
[root@master sbin]# ./stop-yarn.sh
[root@master sbin]#./mr-jobhistory-daemon.sh stop historyserver 或者 mapred --daemon stop historyserver
关机:
poweroff