Zookeeper学习7_Zookeeper节点的增加及故障演练

本文详细介绍了如何在Hadoop集群环境中增加Zookeeper服务器并进行故障演练,验证了Zookeeper集群的容错能力。通过配置多台Zookeeper服务器,确保集群数量为奇数,模拟不同服务器故障情况,检查集群在异常中断时仍能提供服务的能力。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Zookeeper节点的增加及故障演练


环境介绍
OS: Ubuntu 10.10 Server 64-bit
Servers:
hadoop-master:10.6.1.150
- namenode,jobtracker;hbase-master,hbase-thrift;
- secondarynamenode;
- hive-master,hive-metastore;
- zookeeper-server;
- flume-master
- flume-node
- datanode,taskTracker

hadoop-node-1:10.6.1.151
- datanode,taskTracker;hbase-regionServer;
- zookeeper-server;
- flume-node

hadoop-node-2:10.6.1.152
- dataNode,taskTracker;hbase-regionServer;
- zookeeper-server;
- flume-node

以上环境的配置过程请参见:Hadoop集群实践 之 (0) 完整架构设计

下面将增加两台zookeeper-server
zookeeper-single-1:10.6.1.161
- zookeeper-server;

zookeeper-single-2:10.6.1.162
- zookeeper-server;

本文定义的规范,避免在配置多台服务器上产生理解上的混乱:
所有直接以 $ 开头,没有跟随主机名的命令,都代表需要在所有的服务器上执行,除非后面有单独的//开头的说明。
而以dongguo@zookeeper-single-1:~$开头的命令,则需要在zookeeper-single-2上也做相同的执行,除非有dongguo@zookeeper-single-2:~$同时出现。

1. 配置/etc/hosts
$ sudo vim /etc/hosts

1 127.0.0.1   localhost
2  
3 10.6.1.150 hadoop-master
4 10.6.1.151 hadoop-node-1
5 10.6.1.152 hadoop-node-2
6 10.6.1.153 hadoop-node-3
7 10.6.1.161 zookeeper-single-1
8 10.6.1.162 zookeeper-single-2

2. 修改主机名
dongguo@zookeeper-single-1:~$ sudo vim /etc/hostname

1 zookeeper-single-1

dongguo@zookeeper-single-1:~$ sudo hostname zookeeper-single-1

dongguo@zookeeper-single-2:~$ sudo vim /etc/hostname

1 zookeeper-single-2

dongguo@zookeeper-single-2:~$ sudo hostname zookeeper-single-2

3. 安装Java环境

添加匹配的Java版本的APT源
dongguo@zookeeper-single-1:~$ sudo apt-get install python-software-properties
dongguo@zookeeper-single-1:~$ sudo vim /etc/apt/sources.list.d/sun-java-community-team-sun-java6-maverick.list

安装sun-java6-jdk
dongguo@zookeeper-single-1:~$ sudo add-apt-repository ppa:sun-java-community-team/sun-java6
dongguo@zookeeper-single-1:~$ sudo apt-get update
dongguo@zookeeper-single-1:~$ sudo apt-get install sun-java6-jdk

4. 配置Cloudera的Hadoop安装源
dongguo@zookeeper-single-1:~$ sudo vim /etc/apt/sources.list.d/cloudera.list

1 deb http://archive.cloudera.com/debian maverick-cdh3u3 contrib
2 deb-src http://archive.cloudera.com/debian maverick-cdh3u3 contrib

dongguo@zookeeper-single-1:~$ sudo apt-get install curl
dongguo@zookeeper-single-1:~$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
dongguo@zookeeper-single-1:~$ sudo apt-get update

5. 安装Zookeeper
dongguo@zookeeper-single-1:~$ sudo apt-get install hadoop-zookeeper-server

6. 配置Zookeeper
$ sudo vim /etc/zookeeper/zoo.cfg

01 tickTime=2000
02 initLimit=10
03 syncLimit=5
04 dataDir=/data/zookeeper
05 clientPort=2181
06 maxClientCnxns=0
07 server.1=hadoop-master:2888:3888
08 server.2=hadoop-node-1:2888:3888
09 server.3=hadoop-node-2:2888:3888
10 server.4=zookeeper-single-1:2888:3888
11 server.5=zookeeper-single-2:2888:3888

dongguo@zookeeper-single-1:~$ sudo mkdir -p /data/zookeeper
dongguo@zookeeper-single-1:~$ sudo chown zookeeper:zookeeper /data/zookeeper

创建myid文件
dongguo@zookeeper-single-1:~$ sudo -u zookeeper vim /data/zookeeper/myid

1 4

dongguo@zookeeper-single-2:~$ sudo -u zookeeper vim /data/zookeeper/myid

1 5

7. 修改整个集群中与Zookeeper相关的系统配置
7.1 修改Hbase相关的Zookeeper配置
$ sudo vim /etc/hbase/conf/hbase-site.xml //仅在hadoop-master,hadoop-node-1与hadoop-node-2上执行

01 <?xml version="1.0"?>
02 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
03  
04 <configuration>
05 <property>
06   <name>hbase.rootdir</name>
07   <value>hdfs://hadoop-master:8020/hbase</value>
08 </property>
09 <property>
10   <name>hbase.cluster.distributed</name>
11   <value>true</value>
12 </property>
13  
14 <property>
15   <name>hbase.zookeeper.quorum</name>
16   <value>hadoop-master,hadoop-node-1,hadoop-node-2,zookeeper-single-1,zookeeper-single-2</value>
17 </property>
18 </configuration>

7.2 修改Flume相关的Zookeeper配置
$ sudo vim /etc/flume/conf/flume-site.xml //仅在hadoop-master,hadoop-node-1与hadoop-node-2上执行

01 <?xml version="1.0"?>
02 <?xml-stylesheet type="text/xsl"  href="configuration.xsl"?>
03  
04 <configuration>
05 <property>
06   <name>flume.master.servers</name>
07   <value>hadoop-master</value>
08   </property>
09 <property>
10   <name>flume.master.store</name>
11   <value>zookeeper</value>
12 </property>
13 <property>
14   <name>flume.master.zk.use.external</name>
15   <value>true</value>
16 </property>
17 <property>
18   <name>flume.master.zk.servers</name>
19   <value>hadoop-master:2181,hadoop-node-1:2181,hadoop-node-2:2181,zookeeper-single-1:2181,zookeeper-single-2:2181</value>
20 </property>
21 </configuration>

7.3 修改Hive相关的Zookeeper配置
dongguo@hadoop-master:~$ sudo vim /etc/hive/conf/hive-site.xml

01 <?xml version="1.0"?>
02 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
03  
04 <configuration>
05  
06 <!-- Hive Configuration can either be stored in this file or in the hadoop configuration files  -->
07 <!-- that are implied by Hadoop setup variables.                                                -->
08 <!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive    -->
09 <!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
10 <!-- resource).                                                                                 -->
11  
12 <!-- Hive Execution Parameters -->
13  
14 <property>
15   <name>javax.jdo.option.ConnectionURL</name>
16   <value>jdbc:mysql://localhost:3306/metastore</value>
17 </property>
18  
19 <property>
20   <name>javax.jdo.option.ConnectionDriverName</name>
21   <value>com.mysql.jdbc.Driver</value>
22 </property>
23  
24 <property>
25   <name>javax.jdo.option.ConnectionUserName</name>
26   <value>hiveuser</value>
27 </property>
28  
29 <property>
30   <name>javax.jdo.option.ConnectionPassword</name>
31   <value>password</value>
32 </property>
33  
34 <property>
35   <name>datanucleus.autoCreateSchema</name>
36   <value>false</value>
37 </property>
38  
39 <property>
40   <name>datanucleus.fixedDatastore</name>
41   <value>true</value>
42 </property>
43  
44 <property>
45   <name>hive.aux.jars.path</name>
47  
49 </property>
50  
51 <property>
52   <name>hbase.zookeeper.quorum</name>
53   <value>hadoop-master,hadoop-node-1,hadoop-node-2,zookeeper-single-1,zookeeper-single-2</value>
54 </property>
55  
56 </configuration>

8. 重启Zookeeper服务
$ sudo /etc/init.d/hadoop-zookeeper-server restart

9. 重启所有与Zookeeper相关的服务
9.1 重启Hbase相关服务
dongguo@hadoop-master:~$ sudo /etc/init.d/hadoop-hbase-master restart
dongguo@hadoop-node-1:~$ sudo /etc/init.d/hadoop-hbase-regionserver restart
dongguo@hadoop-node-2:~$ sudo /etc/init.d/hadoop-hbase-regionserver restart

9.2 重启Flume相关服务
dongguo@hadoop-master:~$ sudo /etc/init.d/flume-master restart
$ sudo /etc/init.d/flume-node restart //仅在hadoop-master,hadoop-node-1与hadoop-node-2上执行

9.3 重启Hive相关服务
dongguo@hadoop-master:~$ sudo /etc/init.d/hadoop-hive-server restart
dongguo@hadoop-master:~$ sudo /etc/init.d/hadoop-hive-metastore restart

10. 检查Zookeeper状态
10.1 通过命令检查配置是否同步
dongguo@zookeeper-single-1:~$ zookeeper-client

1 Connecting to localhost:2181
2 [zk: localhost:2181(CONNECTED) 0] ls /
3 [flume-cfgs, counters-config_version, flume-chokemap, hbase, zookeeper, flume-nodes]
4 [zk: localhost:2181(CONNECTED) 1]

可以看到所有相关系统的配置都已经同步到了新增的zookeeper-server中。

11. 开始进行Zookeeper集群故障演练
演练的思路为模拟Zookeeper服务器故障,检查部分Zookeeper服务器在异常中止的情况下提供服务的能力。
zookeeper在运行的过程中要求必须整个集群的数量为奇数个,目前我们拥有5台zookeeper服务器,能够满足要求。
zookeeper分为三个角色,leader,follower,client,服务端会根据投票自动选择出leader和follower,当这两种角色都存在的时候,整个集群能够正常提供服务,否则整个集群失效。

我们将首先确认各个zookeeper-server的角色,然后随机选择对应的zookeeper服务器依次将其进程杀掉,并在每个过程中检测各个zookeeper-server的角色转变与工作情况。

查看zookeeper-server的状态,通过以下命令
dongguo@zookeeper-single-2:~$ zookeeper-server status

1 JMX enabled by default
2 Using config: /etc/zookeeper/zoo.cfg
3 Mode: follower

如上所示目前zookeeper-single-2的zookeeper角色为follower

获取zookeeper-server的进程ID,通过以下命令
$ ps aux | grep /etc/zookeeper/zoo.cfg | grep -v grep | awk '{print $2}'

然后我们直接使用 sudo kill -9 杀掉其进程ID即可。

结论:
整个故障演练的过程与结果为:
任意服务器组合的故障模拟,证实了zookeeper集群的容错能力确实与其算法相符。
即2n+1=总数,n即为可容纳的故障节点。
在实际的测试中,3台能容错1台,5台能容错2台,7台能容错3台。

原文来自:http://heylinux.com/archives/2063.html

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值