需要的软件
ssh --这个在centos7默认会安装
pdsh --这个在centos7中不会默认安装,需要手工安装:
--1.先安装gcc
--2../configure --with-ssh --without-rsh && make && make install
--3.pdsh -V
规划
由于在笔记本上的虚拟机安装的,现只能规划三个机器:
192.168.142.110:hadoop1,ResourceManager
192.168.142.111:hadoop2,NameNode
192.168.142.112:hadoop3,NodeManager,DataNode,其他服务(例如Web App Proxy Server和MapReduce 作业历史记录服务器)
vi /etc/hosts
192.168.142.110 hadoop1
192.168.142.111 hadoop2
192.168.142.112 hadoop3
关闭防火墙:
systemctl stop firewalld
systemctl disable firewalld
设置/etc/selinux/config
SELINUX=disabled
在非安全模式下,配置Hadoop守护进程的环境
HDFS守护程序是NameNode,SecondaryNameNode和DataNode。YARN守护程序是ResourceManager,NodeManager和WebAppProxy。如果要使用MapReduce,则MapReduce作业历史记录服务器也将运行
相关配置文件:etc/hadoop/hadoop-env.sh and optionally the etc/hadoop/mapred-env.sh and etc/hadoop/yarn-env.sh
- etc/hadoop/hadoop-env.sh:主要是hadoop java环境变量配置及jvm设置等
(1) 安装java
#rpm -ivh jdk-8u92-linux-x64.rpm
(2)修改hadoop-env.sh文件
#vi /opt/software/hadoop-3.2.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_92
export HADOOP_HOME=/opt/software/hadoop-3.2.0
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs (logs目录要事先创建好)
export HADOOP_PID_DIR=${HADOOP_HOME}/pid (pid目录要事先创建好)
export HDFS_SECURE_DN_USER=root
export HDFS_DATANODE_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_SECURE_DN_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
- 设置HADOOP_HOME变量
#vi /root/.bash_profile
export HADOOP_HOME=/opt/software/hadoop-3.2.0
PATH=$PATH:/usr/bin/gcc:$HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# hadoop version
Hadoop 3.2.0
配置Hadoop守护进程
-
etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop2:8020</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property>
-
etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/opt/software/hadoop-3.2.0/hdfs/namenode</value>(事先创建好)) </property> <property> <name>dfs.datanode.name.dir</name> <value>/opt/software/hadoop-3.2.0/hdfs/datanode</value>(事先创建好)) </property> </configuration>
备注:对dfs.hosts和dfs.hosts.exclude说明 <property> <name>dfs.hosts</name> <value>/usr/local/hadoop/conf/datanode-allow-list</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/usr/local/hadoop/conf/datanode-deny-list</value> </property> 其中dfs.host列出了连入namenode的节点,如果为空,则所有的datanode都可以连入namenode。如果不为空,则文件中存在的datanode可以连入。 dfs.hosts.exclude列出了禁止连入namenode的节点。 如果一个节点同时存在于dfs.hosts和dfs.hosts.exclude,则禁止连入
-
etc/hadoop/yarn-site.xml
``` <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1</value> </property> </configuration> ```
-
etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/opt/software/hadoop-3.2.0/yarn/nodemanager/local</value>(事先创建好) </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/opt/software/hadoop-3.2.0/yarn/nodemanager/log</value>(事先创建好) </property> </configuration>
-
-
etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop3:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop3:19888</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/opt/software/hadoop-3.2.0/mapred/mr-history/tmp</value>(事先创建好) </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/opt/software/hadoop-3.2.0/mapred/mr-history/done</value>(事先创建好) </property> </configuration>
-
Slaves File
集群中的计算机充当DataNode和NodeManager,这些都是workers。
#vi etc/hadoop/workers
hadoop3
并为这些建立ssh信任,参考:https://blog.youkuaiyun.com/victory_lei/article/details/81776120,注意非root用户,需要设置权限: .ssh目录的权限必须是700,ssh/authorized_keys文件权限必须是600
操作Hadoop集群
完成所有必要的配置后,将文件分发到所有计算机上的HADOOP_CONF_DIR目录。这应该是所有计算机上的同一目录。
- Hadoop启动
(1) 第一次启动HDFS时,必须对其进行格式化。将新的分布式文件系统格式化为hdfs:
$HADOOP_HOME/bin/hdfs namenode -format my_hadoop_cluster
格式化成功,会打印namenode has been successfully formatted
2019-04-06 03:56:55,410 INFO common.Storage: Storage directory /opt/software/hadoop-3.2.0/hdfs/namenode has been successfully formatted.
2019-04-06 03:56:55,424 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/software/hadoop-3.2.0/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2019-04-06 03:56:55,583 INFO namenode.FSImageFormatProtobuf: Image file /opt/software/hadoop-3.2.0/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
2019-04-06 03:56:55,593 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-04-06 03:56:55,603 INFO namenode.NameNode: SHUTDOWN_MSG:
(2) 启动所有HDFS进程
前提:配置了etc/hadoop/workers 和ssh 免密登录
在namenode节点上执行,即hadoop2上
$HADOOP_HOME/sbin/start-dfs.sh
在hadoop2上,查看进程
# jps
3538 SecondaryNameNode
3303 NameNode
3646 Jps
在hadoop3上,查看进程
# jps
2258 DataNode
2299 Jps
(3) 启动所有YARN进程
前提:配置了etc/hadoop/workers 和ssh 免密登录
在resourceManager节点上执行,即hadoop1上
$HADOOP_HOME/sbin/start-yarn.sh
在hadoop1上,查看进程
# jps
2441 ResourceManager
在hadoop3上,查看进程
# jps
2612 NodeManager
Web界面
一旦Hadoop集群启动并运行,请检查组件的web-ui,如下所述:
- NameNode
http://192.168.142.111:9870/ 注意这里的端口,与与etc/hadoop/core-site.xml配置设置不一样,这个是web访问的端口
- ResourceManager
- MapReduce JobHistory Server
需要单独启动jobHistory
$HADOOP_HOME/bin/mapred --daemon start historyserver
http://192.168.142.112:19888/ 注意这里的端口,见etc/hadoop/mapred-site.xml配置项