作者介绍:简历上没有一个精通的运维工程师。请点击上方的蓝色《运维小路》关注我,下面的思维导图也是预计更新的内容和当前进度(不定时更新)。
前面我们介绍介绍了几个常用的代理服务器,本章节我们讲来讲解Zookeeper这个中间件。
在我们后面要讲解的各种分布式系统里面,需要遵循一个基本原则就是奇数节点。选举的时候,需要满足半数以上的节点:3节点需要2个节点,5节点需要3个节点。才可以正常选举或者提供服务。
我们前面部署了集群版的ZooKeeper,里面有2个角色,一个是Leader,另外一个是Floower,他们是如何来选举自己Leader呢?
这个首先要分区是有数据的选举还是无数据的选举,前面我们在配置集群的时候,给每个节点都添加了一个节点id(myid),尤其在初次选举的时候很重要。
ZooKeeper的选举流程是其集群实现高可用的核心机制,主要依赖ZAB协议(ZooKeeper Atomic Broadcast)。以下是选举流程的详细步骤:
1. 节点状态与角色
-
Looking:参与选举的状态,初始状态。可以简单理解Zookeeper初次启动或者意外重启都会进入这个状态。
-
Leading:Leader节点状态,负责处理写请求及数据同步。
-
Following:Follower节点状态,同步Leader数据并处理读请求。
-
Observer(可选):仅同步数据,不参与选举,需要主动指定才会进入这个状态。后期我们的案例会来配置这个角色。
2. 选举触发条件
-
集群启动:所有节点初始为Looking状态,触发选举。
-
Leader失效:心跳检测超时后,Follower切换为Looking状态,发起新一轮选举。
3. 选举关键参数
-
myid(SID):节点唯一标识,配置文件中指定。
-
zxid:最新事务ID,由epoch(周期号)和计数器组成(如
0x100000001
)。 -
epoch:当前Leader周期编号,每次选举递增。
4. 选举规则(优先级排序)
-
比较zxid:zxid大的节点优先(数据新胜出)。
-
比较SID:zxid相同时,SID大的节点胜出。
5. 选举过程
无数据情况:集群配置完成以后,第一次启动,假设我们这里从myid最小的时候启动。
节点1启动
2025-04-19 10:26:53,060 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1455] - LOOKING
2025-04-19 10:26:53,061 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@946] -New election. My id = 1, proposed zxid=0x0
2025-04-19 10:26:53,071 [myid:] - INFO [ListenerHandler-/192.168.31.140:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1071] - 1 isaccepting connections now, my election bind port: /192.168.31.140:3888
2025-04-19 10:26:53,071 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:26:53,078 [myid:] - WARN [QuorumConnectionThread-[myid=1]-1:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 2 at election address /192.168.31.141:3888
大概意思开始选举,我的myid是1,zxid是0x0(代表没数据),并且自己投自己为Leader。由于其他节点未启动所以还处于选举状态中,也就是我们前面提到的Looking状态。
节点2启动
2025-04-19 10:33:21,444 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1455] - LOOKING
2025-04-19 10:33:21,444 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@946] -New election. My id = 2, proposed zxid=0x0
2025-04-19 10:33:21,454 [myid:] - INFO [ListenerHandler-/192.168.31.141:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1071] - 2 isaccepting connections now, my election bind port: /192.168.31.141:3888
2025-04-19 10:33:21,455 [myid:] - INFO [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
这里的意思和上面基本雷同,然后后续日志显示他已经被当选leader,因为总共3个节点,集群已经满足半数要求,并且由于2号节点的myid是2大于1号节点,所以他选上Leader。
2025-04-19 10:33:21,675 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@903] -
Peer state changed: leading
2025-04-19 10:33:21,675 [myid:] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1549] - LEADING
这个时候节点1也显示他收到了第二个节点的投票信息(没有细节内容),他的状态就会变成Following。
2025-04-19 10:33:21,454 [myid:] - INFO [ListenerHandler-/192.168.31.140:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1076] - Received connection request from /192.168.31.141:37540
2025-04-19 10:33:21,466 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:33:21,469 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:2, n.round:0x4, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:33:21,473 [myid:] - INFO [WorkerReceiver[myid=1]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:2, n.round:0x4, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:33:21,474 [myid:] - WARN [QuorumConnectionThread-[myid=1]-9:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 3 at election address /192.168.31.142:3888
#部分省略
2025-04-19 10:33:21,673 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@903] -
Peer state changed: following
2025-04-19 10:33:21,674 [myid:] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1537] - FOLLOWING
节点3启动
他也会触发选举状态,但是由于集群已经选举出来Leader,所以这他就自动变成Following。
2025-04-19 10:43:45,137 [myid:] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@1455] - LOOKING
2025-04-19 10:43:45,137 [myid:] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@946] -New election. My id = 3, proposed zxid=0x0
2025-04-19 10:43:45,151 [myid:] - INFO [ListenerHandler-/192.168.31.142:3888:o.a.z.s.q.QuorumCnxManager$Listener$ListenerHandler@1071] - 3 isaccepting connections now, my election bind port: /192.168.31.142:3888
2025-04-19 10:43:45,152 [myid:] - INFO [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:43:45,169 [myid:] - INFO [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LEADING, n.leader:2, n.round:0x4, n.peerEpoch:0x1, n.zxid:0x0, message format version:0x2, n.config version:0x0
2025-04-19 10:43:45,169 [myid:] - INFO [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.QuorumPeer@903] -
Peer state changed: following
有数据情况:有数据的情况实际上和上面类似,只是他的判断标准变成了zxid,zxid相同的情况下再对比sid。
运维小路
一个不会开发的运维!一个要学开发的运维!一个学不会开发的运维!欢迎大家骚扰的运维!
关注微信公众号《运维小路》获取更多内容。