我们在运行一些worker实时任务时,为了防止单个节点宕掉后,能有从节点继续工作,从而保证系统的高可用。在此场景下我们需要实现:
1,从节点不能与主节点同时工作,也就是同一时刻只能有一个节点在运行任务。
2,从节点要随时知晓主节点是否正常工作ing,一旦发现主节点宕,立马争取得主节点权顶替工作。
Zookeeper 分布式服务框架是Apache Hadoop 的一个子项目,能够帮助我们很好的实现这个场景,虽然ZooKeeper的功能不仅仅限于用作主从场景实现,参考:https://blog.youkuaiyun.com/duke370503/article/details/52623192
ZooKeeper基本原理:
zookeeper的节点有两种类型,持久节点跟临时节点。临时节点有个特性,就是如果注册这个节点的机器失去连接(通常是宕机),那么这个节点会被zookeeper删除。选主过程就是利用这个特性,举例:
(1)服务器A,B,C 在启动的时候,会争相去ZooKeeper的同一个目录下注册相同名
称的临时节点假设为/ha/master,
(2)假如A率先创建临时节点/ha/master成功,则A机器成为主,B,C发现
该/ha/master节点已经被A抢先注册,则B,C机器成为从机。B,C 从机会一直
监听/ha/master的变化。
(3)假如A机器宕机或者断网与ZooKeeper失去连接,则临时节点/ha/master会被
删除,B,C机器监听到这一变化,就重新去争取注册,谁再次注册/ha/master
成功就成为新主
选主的过程,其实就是简单的争抢在zookeeper注册临时节点的操作,谁注册了约定的临时节点,谁就是master
下面介绍如何使用它的Java API 实现上述主从管理的场景,本文假设你已经在服务器上安装并启动了ZooKeeper,如何安装请自行百度,我只介绍Java 客户端的实现。
1, 在pom.xml中添加如下maven依赖
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.4.8</version>
<exclusions>
<exclusion>
<groupId>com.sun.jmx</groupId>
<artifactId>jmxri</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jdmk</groupId>
<artifactId>jmxtools</artifactId>
</exclusion>
<exclusion>
<groupId>javax.jms</groupId>
<artifactId>jms</artifactId>
</exclusion>
</exclusions>
</dependency>
2,登陆ZooKeeper服务器 cd 进入到安装的bin目录下,打开客户端,创建父节点
create /stock_mot “stockmotnode” 可看到新建了一个父主节点/stock_mot
cd /opt/applications/zookeeper-3.4.10/bin
sh zkCli.sh
[zk: localhost:2181(CONNECTED) 3] ls /
[zookeeper, zk_demo, worker]
[zk: localhost:2181(CONNECTED) 4] create /stock_mot "stockmotnode"
Created /stock_mot
[zk: localhost:2181(CONNECTED) 5] ls /
[stock_mot, zookeeper, zk_demo, worker]
[zk: localhost:2181(CONNECTED) 6]
为了从ZooKeeper获取主从节点创建与查询状态,有两种方式:
1,同步方式,通过while循环不断查询状态
2,异动方式,通过监听回调接口,在回调函数中进行处理
应用程序常常由异步变化通知所驱动,如果采用同步方式,会造成应用程序本身的阻塞。因此我们采用异步回调的方式来实现编码。
3,为了简化应用程序的使用,我们进行如下封装:
(1) 创建配置类
import lombok.Getter;
@Getter
public class ZkConfig {
private final String connectString;//zookeeper服务器ip与地址
private final int sessionTimeout; //超时时间
private final String znodeName; //节点名称
public ZkConfig(String connectString, int sessionTimeout, String znodeName) {
this.connectString = connectString;
this.sessionTimeout = sessionTimeout;
this.znodeName = znodeName;
}
}
(2)创建回调接口,方便应用程序获取异步主从状态
public interface LeaderSelectorListener {
void isLeader();
void notLeader();
}
(3)ZooKeeper的API管理类
public class LeaderSelector {
private final ZkConfig zkConfig;
private volatile ConnectionState state = ConnectionState.NONE;
private ZooKeeper zk;
@Setter
private LeaderSelectorListener listener;
private final Random random = new Random(System.currentTimeMillis());
private final String serverId = Integer.toHexString(random.nextInt());
enum ConnectionState {NONE, CONNECTED, DISCONNECTED, EXPIRED}
public LeaderSelector(ZkConfig zkConfig) {
this.zkConfig = zkConfig;
}
private final Watcher ZkSessionWatcher = new Watcher() {
@Override
public void process(WatchedEvent watchedEvent) {
if (watchedEvent.getType() == Event.EventType.None) {
switch (watchedEvent.getState()) {
case SyncConnected:
state = ConnectionState.CONNECTED;
handleConnectionState(state);
break;
case Disconnected:
state = ConnectionState.DISCONNECTED;
handleConnectionState(state);
break;
case Expired:
state = ConnectionState.EXPIRED;
break;
default:
break;
}
}
}
};
private void handleConnectionState(ConnectionState state) {
switch (state) {
case CONNECTED:
log.info("connection state: {}", state);
enroll();
break;
case DISCONNECTED:
log.info("connection state: {}", state);
handleSelectorState(false);
break;
case EXPIRED:
log.info("connection state: {}", state);
handleSelectorState(false);
//todo 需要重新建立连接
break;
default:
log.info("unknown connection state: {}", state);
break;
}
}
private AsyncCallback.StringCallback masterCreateCallback = new AsyncCallback.StringCallback() {
@Override
public void processResult(int rc, String path, Object ctx, String name) {
switch (KeeperException.Code.get(rc)) {
case CONNECTIONLOSS:
checkMaster();
break;
case OK:
handleSelectorState(true);
break;
case NODEEXISTS:
log.info("leader node exists, add watcher.");
handleSelectorState(false);
addMasterWatcher();
break;
default:
break;
}
}
};
private void addMasterWatcher() {
zk.exists(zkConfig.getZnodeName(), masterExistWatcher, masterExistCallback, null);
}
private Watcher masterExistWatcher = new Watcher() {
@Override
public void process(WatchedEvent watchedEvent) {
if (watchedEvent.getType() == Event.EventType.NodeDeleted) {
if (zkConfig.getZnodeName().equals(watchedEvent.getPath())) {
enroll();
}
}
}
};
private AsyncCallback.StatCallback masterExistCallback = new AsyncCallback.StatCallback() {
@Override
public void processResult(int rc, String path, Object ctx, Stat stat) {
switch (KeeperException.Code.get(rc)) {
case CONNECTIONLOSS:
addMasterWatcher();
break;
case OK:
break;
case NONODE:
enroll();
break;
default:
checkMaster();
break;
}
}
};
private AsyncCallback.DataCallback masterCheckCallback = new AsyncCallback.DataCallback() {
@Override
public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
switch (KeeperException.Code.get(rc)) {
case CONNECTIONLOSS:
checkMaster();
return;
case NONODE:
enroll();
return;
case OK:
log.info("current leader node server id is [{}]", serverId);
if (serverId.equals(new String(data))) {
handleSelectorState(true);
} else {
log.info("current leader node server id [{}] doesn't equal to mine server id [{}]", new String(data), serverId);
handleSelectorState(false);
addMasterWatcher();
}
break;
default:
break;
}
}
};
private void checkMaster() {
zk.getData(zkConfig.getZnodeName(), false, masterCheckCallback, null);
}
private void handleSelectorState(boolean selected) {
if (listener != null) {
if (selected) {
listener.isLeader();
} else {
listener.notLeader();
}
}
}
private void enroll() {
log.info("enroll leader node by server id [{}]", serverId);
zk.create(zkConfig.getZnodeName(), serverId.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL, masterCreateCallback, null);
}
public void start() {
if (state == ConnectionState.NONE) {
try {
log.info("start leader select");
zk = new ZooKeeper(zkConfig.getConnectString(), zkConfig.getSessionTimeout(), ZkSessionWatcher);
} catch (IOException e) {
log.error(e.getMessage(), e);
}
} else {
log.error("leader selector cannot start when state is [{}]", state);
}
}
public void stop() {
state = ConnectionState.NONE;
if (zk != null) {
try {
zk.close();
listener.notLeader();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
4,在应用程序中使用测试:
测试demo类如下:
@Slf4j
public class ZooKeeperTest {
private ZkConfig zkConfig;
public static boolean server1Started = false;
public static boolean server2Started = false;
@Before
public void init_zk_config() {
zkConfig = new ZkConfig("28.163.0.65:2181", 5000, "/stock_mot/myFirst");
}
@Test
public void testSelectMaster() throws InterruptedException {
LeaderSelector server1 = new LeaderSelector(zkConfig);
LeaderSelector server2 = new LeaderSelector(zkConfig);
server1.setListener(new LeaderSelectorListener() {
@Override
public void isLeader() {
log.info("server1 is the master");
server1Started = true;
}
@Override
public void notLeader() {
log.info("server1 not the master");
server1Started = false;
}
});
server2.setListener(new LeaderSelectorListener() {
@Override
public void isLeader() {
log.info("server2 is the master");
server2Started = true;
}
@Override
public void notLeader() {
log.info("server2 not the master");
server2Started = false;
}
});
server1.start();
Thread.sleep(500);
server2.start();
Thread.sleep(500);
Thread t = new Thread(new Runnable() {
@Override
public void run() {
while (server1Started | server2Started) {
if (server1Started) {
log.info("check server1 is the master");
}
if (server2Started) {
log.info("check server2 is the master");
}
if (server1Started && server2Started){
log.error("check there Error to find duplicate master");
}
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
t.start();
Thread.sleep(5000);
server1.stop();
Thread.sleep(5000);
server2.stop();
}
}
运行testSelectMaster()可以看到如下输出:
[2018-04-27 22:49:44 INFO ] [main] (org.apache.zookeeper.ZooKeeper:438) - Initiating client connection, connectString=28.163.0.65:2181 sessionTimeout=5000 watcher=com.ounersc.ic.stock.mot.masterselect.LeaderSelector$1@49e4cb85
[2018-04-27 22:49:44 INFO ] [main-SendThread(28.163.0.65:2181)] (org.apache.zookeeper.ClientCnxn:1032) - Opening socket connection to server 28.163.0.65/28.163.0.65:2181. Will not attempt to authenticate using SASL (unknown error)
[2018-04-27 22:49:45 INFO ] [main-SendThread(28.163.0.65:2181)] (org.apache.zookeeper.ClientCnxn:876) - Socket connection established to 28.163.0.65/28.163.0.65:2181, initiating session
[2018-04-27 22:49:45 INFO ] [main-SendThread(28.163.0.65:2181)] (org.apache.zookeeper.ClientCnxn:1299) - Session establishment complete on server 28.163.0.65/28.163.0.65:2181, sessionid = 0x161d689c49f018f, negotiated timeout = 5000
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:58) - connection state: CONNECTED
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:172) - enroll leader node by server id [b38c96a7]
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:34) - server1 is the master
[2018-04-27 22:49:45 INFO ] [main] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:179) - start leader select
[2018-04-27 22:49:45 INFO ] [main] (org.apache.zookeeper.ZooKeeper:438) - Initiating client connection, connectString=28.163.0.65:2181 sessionTimeout=5000 watcher=com.ounersc.ic.stock.mot.masterselect.LeaderSelector$1@edf4efb
[2018-04-27 22:49:45 INFO ] [main-SendThread(28.163.0.65:2181)] (org.apache.zookeeper.ClientCnxn:1032) - Opening socket connection to server 28.163.0.65/28.163.0.65:2181. Will not attempt to authenticate using SASL (unknown error)
[2018-04-27 22:49:45 INFO ] [main-SendThread(28.163.0.65:2181)] (org.apache.zookeeper.ClientCnxn:876) - Socket connection established to 28.163.0.65/28.163.0.65:2181, initiating session
[2018-04-27 22:49:45 INFO ] [main-SendThread(28.163.0.65:2181)] (org.apache.zookeeper.ClientCnxn:1299) - Session establishment complete on server 28.163.0.65/28.163.0.65:2181, sessionid = 0x161d689c49f0190, negotiated timeout = 5000
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:58) - connection state: CONNECTED
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:172) - enroll leader node by server id [b380d8cd]
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:87) - leader node exists, add watcher.
[2018-04-27 22:49:45 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:54) - server2 not the master
[2018-04-27 22:49:45 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:69) - check server1 is the master
[2018-04-27 22:49:46 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:69) - check server1 is the master
[2018-04-27 22:49:47 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:69) - check server1 is the master
[2018-04-27 22:49:48 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:69) - check server1 is the master
[2018-04-27 22:49:49 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:69) - check server1 is the master
[2018-04-27 22:49:50 INFO ] [main] (org.apache.zookeeper.ZooKeeper:684) - Session: 0x161d689c49f018f closed
[2018-04-27 22:49:50 INFO ] [main] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:40) - server1 not the master
[2018-04-27 22:49:50 INFO ] [main-EventThread] (org.apache.zookeeper.ClientCnxn:519) - EventThread shut down for session: 0x161d689c49f018f
[2018-04-27 22:49:50 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.masterselect.LeaderSelector:172) - enroll leader node by server id [b380d8cd]
[2018-04-27 22:49:50 INFO ] [main-EventThread] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:48) - server2 is the master
[2018-04-27 22:49:50 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:73) - check server2 is the master
[2018-04-27 22:49:51 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:73) - check server2 is the master
[2018-04-27 22:49:52 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:73) - check server2 is the master
[2018-04-27 22:49:53 INFO ] [Thread-0] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:73) - check server2 is the master
[2018-04-27 22:49:54 INFO ] [Thread-0] (com.ounersc.icstock.mot.calc.ZooKeeperTest:73) - check server2 is the master
[2018-04-27 22:49:55 INFO ] [main] (org.apache.zookeeper.ZooKeeper:684) - Session: 0x161d689c49f0190 closed
[2018-04-27 22:49:55 INFO ] [main] (com.ounersc.ic.stock.mot.calc.ZooKeeperTest:54) - server2 not the master
[2018-04-27 22:49:55 INFO ] [main-EventThread] (org.apache.zookeeper.ClientCnxn:519) - EventThread shut down for session: 0x161d689c49f0190
Process finished with exit code 0
首先启动server1,此时无主节点,server1成为主节点。
然后启动了server2,此时server1 已经成为master。server2作为从节点监听主节点状态。输出:check server1 is the master
server1作为主节点运行5秒后,关闭server1 。此时server2获取到了主节点
输出:check server2 is the master