Leader Shard的选举原理主要是依靠LeaderElector.java类来实现的。
函数checkIfIamLeader开始真正的leader选举,根据和Zookeeper上创建的znode的nodeName对比,判断自己是否是leader。
- 如果是leader,好了,更新的活以后该你去干了,执行注册leader的一系列更新操作。
- 如果不是leader,而是replica的话,则在leader的znode上注册watcher,关注leader znode的状态。
- 一旦发现leader znode消失,则会触发下一次的leader选举过程。
/**
* Check if the candidate with the given n_* sequence number is the leader.
* If it is, set the leaderId on the leader zk node. If it is not, start
* watching the candidate that is in line before this one - if it goes down, check
* if this candidate is the leader again.
*
* @param replacement has someone else been the leader already?
*/
private void checkIfIamLeader(final ElectionContext context, boolean replacement) throws KeeperException,
InterruptedException, IOException {
context.checkIfIamLeaderFired();
// get all other numbers...
final String holdElectionPath = context.electionPath + ELECTION_NODE;
List<String> seqs = zkClient.getChildren(holdElectionPath, null, true);
sortSeqs(seqs);
String leaderSeqNodeName = context.leaderSeqPath.substring(context.leaderSeqPath.lastIndexOf('/') + 1);
if (!seqs.contains(leaderSeqNodeName)) {
log.warn("Our node is no longer in line to be leader");
return;
}
// If any double-registrations exist for me, remove all but this latest one!
// TODO: can we even get into this state?
String prefix = zkClient.getSolrZooKeeper().getSessionId() + "-" + context.id + "-";
Iterator<String> it = seqs.iterator();
while (it.hasNext()) {
String elec = it.next();
if (!elec.equals(leaderSeqNodeName) && elec.startsWith(prefix)) {
try {
String toDelete = holdElectionPath + "/" + elec;
log.warn("Deleting duplicate registration: {}", toDelete);
zkClient.delete(toDelete, -1, true);
} catch (KeeperException.NoNodeException e) {
// ignore
}
it.remove();
}
}
if (leaderSeqNodeName.equals(seqs.get(0))) {
// I am the leader
try {
runIamLeaderProcess(context, replacement);
} catch (KeeperException.NodeExistsException e) {
log.error("node exists",e);
retryElection(context, false);
return;
}
} else {
// I am not the leader - watch the node below me
String toWatch = seqs.get(0);
for (String node : seqs) {
if (leaderSeqNodeName.equals(node)) {
break;
}
toWatch = node;
}
try {
String watchedNode = holdElectionPath + "/" + toWatch;
zkClient.getData(watchedNode, watcher = new ElectionWatcher(context.leaderSeqPath, watchedNode, getSeq(context.leaderSeqPath), context), null, true);
log.debug("Watching path {} to know if I could be the leader", watchedNode);
} catch (KeeperException.SessionExpiredException e) {
throw e;
} catch (KeeperException.NoNodeException e) {
// the previous node disappeared, check if we are the leader again
checkIfIamLeader(context, true);
} catch (KeeperException e) {
// we couldn't set our watch for some other reason, retry
log.warn("Failed setting watch", e);
checkIfIamLeader(context, true);
}
}
}
Leader重新选举
- 正常运行的SolrCloud已产生一个leader(Znode编号最小,比如XXX_node1_0000001),后续的Replica后在leader节点上注册Watcher。当Leader下线时候,即短连接断开,那么Zookeeper上的Znode(比如XXX_node1_0000001)就会被删除。
- 此时,所有Replica在Leader节点上的watcher就会监控到这一变化,所有的Replica就会进行leader选举,选举的原则依然是判断自己是不是目前注册在/collections/collectionTest/leader_select/shard1/election下的Znode编号最小的那位,是的话就是Leader,否则就是Replica。
- 如果判断自己是Replica,就会继续在leader的Znode上(这个时候的leader是XXX_node1_0000002)注册watcher,等待leader下线再次触发选举leader。
- 如果这个时候原先下线的leader上线了会怎么样,它就会被当做新的一个Solr节点注册到Zookeeper上,并获取一个比现有Znode更大的编号,在Leader Znode节点上注册watcher,等待它的选举机会。
这篇文章讲得很好,但它base的版本是比较老的Solr 4.x了。
Reference
http://quentinxxz.iteye.com/blog/2149891
https://www.cnblogs.com/rcfeng/p/4082568.html
https://www.cnblogs.com/saratearing/p/5690476.html
https://blog.youkuaiyun.com/u011026968/article/details/50336709
https://blog.youkuaiyun.com/iteye_16982/article/details/82574099
Zookeeper简介
https://www.cnblogs.com/xinfang520/p/7717684.html
SolrCloud Recovery原理及无法选举分片Leader
https://www.sohu.com/a/130752460_505885