配置Hadoop环境
下载安装
下载oracle jdk:jdk下载
得到:jdk-14.0.2_linux-x64_bin.tar.gz
下载Hadoop:Hadoop下载
得到:hadoop-3.3.0.tar.gz
将二者移动到/usr/local文件夹,分别解压:
$ sudo tar xzf hadoop-3.3.0.tar.gz
$ sudo mv hadoop-3.3.0 hadoop
$ sudo tar xzf jdk-14.0.2_linux-x64_bin.tar.gz
配置环境变量
$ sudo vim ~/.bashrc
在文件末尾添加:
#set oracle jdk && hadoop environment
export JAVA_HOME=/usr/local/jdk-14.0.2
export HADOOP_HOME=/usr/local/hadoop
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
执行:source ~/.bashrc
使配置立即生效。
检查配置是否正确:
$ java -version
java version "14.0.2" 2020-07-14
Java(TM) SE Runtime Environment (build 14.0.2+12-46)
Java HotSpot(TM) 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)
$ hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
简单测试
利用Hadoop安装提供的示例 MapReduce jar 文件,计算文件的单词总数。
$ mkdir input
$ cp $HADOOP_HOME/*.txt input
$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount input ouput
output文件夹下出现两个文件:
part-r-00000 _SUCCESS
统计结果存储在 part-r-00000:
"AS 3
"Contribution" 1
"Contributor" 1
"Derivative 1
"Legal 1
"License" 1
"License"); 1
"Licensor" 1
"NOTICE" 1
"Not 1
"Object" 1
"Software"), 1
"Source" 1
"Work" 1
"You" 1
"Your") 1
…
_SUCCESS为空文件,应该是用来指示执行成功的。