Hadoop HA部署
1.规划集群
hadoop01 | hadoop02 | hadoop03 |
---|---|---|
NameNode | NameNode | NameNode |
journalNode | journalNode | journalNode |
ZKFC | ZKFC | ZKFC |
DataNode | DataNode | DataNode |
ZK | ZK | ZK |
ResourceManager | ||
NodeManager | NodeManager | NodeManager |
2.配置Zookeeper
1)集群规划:在hadoop01、hadoop02、hadoop03三个节点上部署Zookeeper。
2)解压安装
(1)解压Zookeeper安装包到/opt/module/目录下
[root@node1 ~]$ tar -zxvf apache-zookeeper-3.6.3-bin.tar.gz -C /opt/module/
(2)在/opt/module/apache-zookeeper-3.6.3-bin/这个目录下创建zkData目录
[root@node1 ~]$ cd /opt/module/apache-zookeeper-3.6.3-bin/
[root@node1 apache-zookeeper-3.6.3-bin]$ pwd
/opt/module/apache-zookeeper-3.6.3-bin
[root@node1 apache-zookeeper-3.6.3-bin]$ mkdir zkData/
[root@node1 apache-zookeeper-3.6.3-bin]$ ll
total 36
drwxr-xr-x. 2 myhadoop myhadoop 4096 Apr 9 2021 bin
drwxr-xr-x. 2 myhadoop myhadoop 77 Apr 9 2021 conf
drwxr-xr-x. 5 myhadoop myhadoop 4096 Apr 9 2021 docs
drwxrwxr-x. 2 myhadoop myhadoop 4096 Oct 19 08:40 lib
-rw-r--r--. 1 myhadoop myhadoop 11358 Apr 9 2021 LICENSE.txt
-rw-r--r--. 1 myhadoop myhadoop 432 Apr 9 2021 NOTICE.txt
-rw-r--r--. 1 myhadoop myhadoop 1963 Apr 9 2021 README.md
-rw-r--r--. 1 myhadoop myhadoop 3166 Apr 9 2021 README_packaging.md
drwxrwxr-x. 2 myhadoop myhadoop 6 Oct 19 08:43 zkData
(3)将/opt/module/apache-zookeeper-3.6.3-bin/conf/ 路径下的zoo_sample.cfg修改为zoo.cfg
[root@node1 apache-zookeeper-3.6.3-bin]$ cd conf/
[root@node1 conf]$ ll
total 12
-rw-r--r--. 1 myhadoop myhadoop 535 Apr 9 2021 configuration.xsl
-rw-r--r--. 1 myhadoop myhadoop 3435 Apr 9 2021 log4j.properties
-rw-r--r--. 1 myhadoop myhadoop 1148 Apr 9 2021 zoo_sample.cfg
[root@node1 conf]$ mv zoo_sample.cfg zoo.cfg
3)配置zoo.cfg文件。编辑zoo.cfg文件,修改dataDir路径:
[root@node1 conf]$ vim zoo.cfg
dataDir=/opt/module/apache-zookeeper-3.6.3-bin/zkData
在zoo.cfg文件最后添加如下内容:
#######################cluster##########################
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
保存退出
3.集群操作
1)在/opt/module/apache-zookeeper-3.6.3-bin/zkData目录中创建一个myid的文件
[root@node1 zkData]$ pwd
/opt/module/apache-zookeeper-3.6.3-bin/zkData
[root@node1 zkData]$ touch myid
[root@node1 zkData]$ ll
total 0
-rw-rw-r--. 1 myhadoop myhadoop 0 Oct 19 08:51 myid
2)编辑myid文件,在里面添与服务器对应的编号:如1
[root@node1 zkData]$ vim myid
1
3)拷贝配置好的zookeeper到hadoop02、 hadoop03上,使用xsync脚本分发
[root@node1 ~]$ xsync /opt/module/apache-zookeeper-3.6.3-bin/
在hadoop02将myid文件中的内容改为2、hadoop03上将myid文件中的内容改为3
[root@hadoop02 ~]# su - myhadoop
Last login: Mon Oct 17 12:39:33 CST 2022 on pts/1
[root@node2 ~]$ vim /opt/module/apache-zookeeper-3.6.3-bin/zkData/myid
2
[root@hadoop03 ~]# su - myhadoop
Last login: Tue Oct 18 08:20:28 CST 2022 on pts/0
[root@node3 ~]$ vim /opt/module/apache-zookeeper-3.6.3-bin/zkData/myid
3
4)编写zookeeper群操作脚本,在myhaoop用户的家目录的bin目录中编写脚本
[root@node1 ~]$ cd bin/
[root@node1 bin]$ pwd
/home/myhadoop/bin
[root@node1 bin]$ vim zoo.sh
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
for host in hadoop01 hadoop02 hadoop03
do
case $1 in
"start")
echo " =================== 启动 $host zookeeper ==================="
ssh $host "/opt/module/apache-zookeeper-3.6.3-bin/bin/zkServer.sh start"
;;
"stop")
echo " =================== 停止 $host zookeeper ==================="
ssh $host "/opt/module/apache-zookeeper-3.6.3-bin/bin/zkServer.sh stop"
;;
"status")
echo " =================== 查看 $host zookeeper 状态 ==================="
ssh $host "/opt/module/apache-zookeeper-3.6.3-bin/bin/zkServer.sh status"
;;
"restart")
echo " =================== 重启 $host zookeeper ==================="
ssh $host "/opt/module/apache-zookeeper-3.6.3-bin/bin/zkServer.sh restart"
esac
done
授予可执行权限:
[root@node1 bin]$ chmod a+x zoo.sh
分发脚本:
[root@node1 bin]$ xsync /home/myhadoop/bin/
5)使用脚本启动zookeeper集群
[root@node1 bin]$ zoo.sh start
=================== 启动 hadoop01 zookeeper ===================
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.6.3-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
=================== 启动 hadoop02 zookeeper ===================
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.6.3-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
=================== 启动 hadoop03 zookeeper ===================
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.6.3-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
6)查看状态
[root@node1 ~]$ zoo.sh status
=================== 查看 hadoop01 zookeeper 状态 ===================
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.6.3-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
=================== 查看 hadoop02 zookeeper 状态 ===================
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.6.3-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
=================== 查看 hadoop03 zookeeper 状态 ===================
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.6.3-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
4.配置HDFS-HA自动故障转移
注意:在手动故障转移的基础上配置
1)具体配置
(1)配置hdfs-site.xml文件
[root@node1 ~]$ vim /opt/ha/hadoop-3.3.1/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
(2)配置core-site.xml文件
[root@node1 ~]$ vim /opt/ha/hadoop-3.3.1/etc/hadoop/core-site.xml
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
分发配置文件:
[root@node1 ~]$ xsync /opt/ha/hadoop-3.3.1/etc/hadoop/
2)启动
(1)关闭所有的HDFS服务:
[root@node1 ~]$ /opt/ha/hadoop-3.3.1/sbin/stop-dfs.sh
[root@node1 ~]$ jpsall
=============== hadoop01 ===============
5484 Jps
3661 QuorumPeerMain
=============== hadoop02 ===============
3370 QuorumPeerMain
6620 Jps
=============== hadoop03 ===============
3233 QuorumPeerMain
7915 Jps
(2)启动Zookeeper集群
[root@node1 ~]$ zoo.sh restart
(3)初始化HA在Zookeeper中状态
[root@node1 ~]$ hdfs zkfc -formatZK
(4)启动HDFS服务
[root@node1 ~]$ /opt/ha/hadoop-3.3.1/sbin/start-dfs.sh
Starting namenodes on [hadoop01 hadoop02 hadoop03]
Starting datanodes
Starting journal nodes [hadoop03 hadoop02 hadoop01]
Starting ZK Failover Controllers on NN hosts [hadoop01 hadoop02 hadoop03]
[root@node1 ~]$ jpsall
=============== hadoop01 ===============
6534 Jps
6458 DFSZKFailoverController
5918 NameNode
6046 DataNode
6270 JournalNode
5583 QuorumPeerMain
=============== hadoop02 ===============
6944 DataNode
7042 JournalNode
7154 DFSZKFailoverController
6717 QuorumPeerMain
6861 NameNode
7247 Jps
=============== hadoop03 ===============
8243 DataNode
8342 JournalNode
8474 DFSZKFailoverController
8027 QuorumPeerMain
8524 Jps
8159 NameNode
3)在web端查看状态:一个Active的NameNode,两个Standby的NameNode
Overview ‘hadoop01:9000’ (standby)
Overview ‘hadoop02:9000’ (active)
Overview ‘hadoop03:9000’ (standby)
4)验证,将Active NameNode进行Kill掉。验证故障转移
[root@node2 ~]$ jps
6944 DataNode
7042 JournalNode
7154 DFSZKFailoverController
7290 Jps
6717 QuorumPeerMain
6861 NameNode
[root@node2 ~]$ kill -9 6861
Overview ‘hadoop03:9000’ (active)
Overview ‘hadoop01:9000’ (standby)
YARN-HA配置
1.规划集群
hadoop01 | hadoop02 | hadoop03 |
---|---|---|
NameNode | NameNode | NameNode |
journalNode | journalNode | journalNode |
ZKFC | ZKFC | ZKFC |
DataNode | DataNode | DataNode |
ZK | ZK | ZK |
ResourceManager | ResourceManager | ResourceManager |
NodeManager | NodeManager | NodeManager |
2.具体配置
(1)yarn-site.xml配置
[root@node1 ~]$ vim /opt/ha/hadoop-3.3.1/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--启用resourcemanager ha-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--声明HA resourcemanager的地址-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
</property>
<!-- 指定RM的逻辑列表 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2,rm3</value>
</property>
<!-- 指定rm1 的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node1</value>
</property>
<!-- 指定rm1的web端地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node1:8088</value>
</property>
<!-- =========== rm1 配置============ -->
<!-- 指定rm1的内部通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>node1:8032</value>
</property>
<!-- 指定AM向rm1申请资源的地址 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>node1:8030</value>
</property>
<!-- 指定供NM连接的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>node1:8031</value>
</property>
<!-- =========== rm2 配置============ -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>node2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>node2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>node2:8031</value>
</property>
<!-- =========== rm3 配置============ -->
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>node03</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm3</name>
<value>node3:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm3</name>
<value>node3:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm3</name>
<value>node3:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm3</name>
<value>node3:8031</value>
</property>
<!--指定zookeeper集群的地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<!--启用自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 启用日志聚合功能 -->
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 在HDFS上聚集的日志最多保存多长时间 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<!--当应用程序运行结束后,日志被转移到的HDFS目录(启用日志聚集功能时有效)-->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
</configuration>
(2)同步配置文件到其他节点
[root@node1 ~]$ xsync /opt/ha/hadoop-3.3.1/etc/hadoop/
(3)重启HDFS
[root@node1 ~]$ /opt/ha/hadoop-3.3.1/sbin/stop-dfs.sh
Stopping namenodes on [hadoop01 hadoop02 hadoop03]
Stopping datanodes
Stopping journal nodes [hadoop03 hadoop02 hadoop01]
Stopping ZK Failover Controllers on NN hosts [hadoop01 hadoop02 hadoop03]
[root@node1 ~]$ /opt/ha/hadoop-3.3.1/sbin/start-dfs.sh
Starting namenodes on [hadoop01 hadoop02 hadoop03]
Starting datanodes
Starting journal nodes [hadoop03 hadoop02 hadoop01]
Starting ZK Failover Controllers on NN hosts [hadoop01 hadoop02 hadoop03]
(4)启动YARN
1)在hadoop02中执行操作:
[root@node2 ~]$ /opt/ha/hadoop-3.3.1/sbin/start-yarn.sh
Starting resourcemanagers on [ hadoop01 hadoop02 hadoop03]
Starting nodemanagers
2)查看服务
[root@node2 ~]$ yarn rmadmin -getServiceState rm1
standby
[root@node2 ~]$ yarn rmadmin -getServiceState rm2
active
[root@node2 ~]$ yarn rmadmin -getServiceState rm3
standby
3)在web端查看在YARN状态
ResourceManager HA state: | standby |
---|
ResourceManager HA state: | active |
---|
urceManager HA state: | standby |
---|