【Solr原理】Leader Shard选举

SolrCloud的Leader选举主要由LeaderElector.java类负责。当Leader下线时,Zookeeper上的对应Znode会被删除,触发Replica进行新的选举。选举原则是判断哪个节点的Znode编号最小,该节点成为新Leader。当选为Replica的节点会在新的Leader节点上注册watcher,等待下次选举。原先的Leader上线后,会作为新节点注册并等待机会。此过程基于较旧的Solr 4.x版本。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Leader Shard的选举原理主要是依靠LeaderElector.java类来实现的。

函数checkIfIamLeader开始真正的leader选举,根据和Zookeeper上创建的znode的nodeName对比,判断自己是否是leader。

  1. 如果是leader,好了,更新的活以后该你去干了,执行注册leader的一系列更新操作。
  2. 如果不是leader,而是replica的话,则在leader的znode上注册watcher,关注leader znode的状态。
  3. 一旦发现leader znode消失,则会触发下一次的leader选举过程。
  /**
   * Check if the candidate with the given n_* sequence number is the leader.
   * If it is, set the leaderId on the leader zk node. If it is not, start
   * watching the candidate that is in line before this one - if it goes down, check
   * if this candidate is the leader again.
   *
   * @param replacement has someone else been the leader already?
   */
  private void checkIfIamLeader(final ElectionContext context, boolean replacement) throws KeeperException,
      InterruptedException, IOException {
    context.checkIfIamLeaderFired();
    // get all other numbers...
    final String holdElectionPath = context.electionPath + ELECTION_NODE;
    List<String> seqs = zkClient.getChildren(holdElectionPath, null, true);
    sortSeqs(seqs);

    String leaderSeqNodeName = context.leaderSeqPath.substring(context.leaderSeqPath.lastIndexOf('/') + 1);
    if (!seqs.contains(leaderSeqNodeName)) {
      log.warn("Our node is no longer in line to be leader");
      return;
    }

    // If any double-registrations exist for me, remove all but this latest one!
    // TODO: can we even get into this state?
    String prefix = zkClient.getSolrZooKeeper().getSessionId() + "-" + context.id + "-";
    Iterator<String> it = seqs.iterator();
    while (it.hasNext()) {
      String elec = it.next();
      if (!elec.equals(leaderSeqNodeName) && elec.startsWith(prefix)) {
        try {
          String toDelete = holdElectionPath + "/" + elec;
          log.warn("Deleting duplicate registration: {}", toDelete);
          zkClient.delete(toDelete, -1, true);
        } catch (KeeperException.NoNodeException e) {
          // ignore
        }
        it.remove();
      }
    }

    if (leaderSeqNodeName.equals(seqs.get(0))) {
      // I am the leader
      try {
        runIamLeaderProcess(context, replacement);
      } catch (KeeperException.NodeExistsException e) {
        log.error("node exists",e);
        retryElection(context, false);
        return;
      }
    } else {
      // I am not the leader - watch the node below me
      String toWatch = seqs.get(0);
      for (String node : seqs) {
        if (leaderSeqNodeName.equals(node)) {
          break;
        }
        toWatch = node;
      }
      try {
        String watchedNode = holdElectionPath + "/" + toWatch;
        zkClient.getData(watchedNode, watcher = new ElectionWatcher(context.leaderSeqPath, watchedNode, getSeq(context.leaderSeqPath), context), null, true);
        log.debug("Watching path {} to know if I could be the leader", watchedNode);
      } catch (KeeperException.SessionExpiredException e) {
        throw e;
      } catch (KeeperException.NoNodeException e) {
        // the previous node disappeared, check if we are the leader again
        checkIfIamLeader(context, true);
      } catch (KeeperException e) {
        // we couldn't set our watch for some other reason, retry
        log.warn("Failed setting watch", e);
        checkIfIamLeader(context, true);
      }
    }
  }

Leader重新选举

  1. 正常运行的SolrCloud已产生一个leader(Znode编号最小,比如XXX_node1_0000001),后续的Replica后在leader节点上注册Watcher。当Leader下线时候,即短连接断开,那么Zookeeper上的Znode(比如XXX_node1_0000001)就会被删除。
  2. 此时,所有Replica在Leader节点上的watcher就会监控到这一变化,所有的Replica就会进行leader选举,选举的原则依然是判断自己是不是目前注册在/collections/collectionTest/leader_select/shard1/election下的Znode编号最小的那位,是的话就是Leader,否则就是Replica。
  3. 如果判断自己是Replica,就会继续在leader的Znode上(这个时候的leader是XXX_node1_0000002)注册watcher,等待leader下线再次触发选举leader。
  4. 如果这个时候原先下线的leader上线了会怎么样,它就会被当做新的一个Solr节点注册到Zookeeper上,并获取一个比现有Znode更大的编号,在Leader Znode节点上注册watcher,等待它的选举机会。

这篇文章讲得很好,但它base的版本是比较老的Solr 4.x了。

Reference
http://quentinxxz.iteye.com/blog/2149891
https://www.cnblogs.com/rcfeng/p/4082568.html
https://www.cnblogs.com/saratearing/p/5690476.html
https://blog.youkuaiyun.com/u011026968/article/details/50336709
https://blog.youkuaiyun.com/iteye_16982/article/details/82574099

Zookeeper简介
https://www.cnblogs.com/xinfang520/p/7717684.html

SolrCloud Recovery原理及无法选举分片Leader
https://www.sohu.com/a/130752460_505885

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值