[代码] solrcloud/solr4.0的启动步骤

本文详细解析了Solr 4.0启动过程中的关键步骤,包括核心容器初始化、Zookeeper集成、领导者选举机制及本地与分布式恢复流程等。

This page show the major procedures in the progress of Solr4.0  startup 

 SolrDispatchFilter.init(FilterConfig config) init the CoreContainer firstly.   

public void init(FilterConfig config) throws ServletException
  {
    ...........
    CoreContainer.Initializer init = createInitializer();
    ...........
    this.cores = init.initialize();
    ..........
   }

then CoreContainer.Initalizer.initializer() call the CoreContainer.load()

public CoreContainer initialize() throws IOException,
        ParserConfigurationException, SAXException {
      CoreContainer cores = null;
      String solrHome = SolrResourceLoader.locateSolrHome();
      File fconf = new File(solrHome, containerConfigFilename == null "solr.xml"
          : containerConfigFilename);
      cores = new CoreContainer(solrHome);
 
      if (fconf.exists()) {
        cores.load(solrHome, fconf);
      else {
        log.info("no solr.xml file found − using default");
        cores.load(solrHome, new InputSource(new ByteArrayInputStream(DEF_SOLR_XML.getBytes("UTF−8"))));
        cores.configFile = fconf;
      }
 
      containerConfigFilename = cores.getConfigFile().getName();
 
      return cores;
    }
  

CoreContainer.load(solrHome, fconf) call CoreContainer.load(String dir, InputSource cfgis). This function is the most important part for Solr4.0's startup. Many members of CoreContainer initialize here, including OverSeer, ZkCotroller,CoreAdminHandler and CollectionHandler.  Now we go in to this function 

..........
initZooKeeper(zkHost, zkClientTimeout);//this calling will initialize the zkControler
 
..........
coreAdminHandler = new CoreAdminHandler(this);
..........
 
..........
NodeList nodes = (NodeList)cfg.evaluate("solr/cores/core", XPathConstants.NODESET); //got croe config info from solr.xml
 
    for (int i=0; i<nodes.getLength(); i++) {
      Node node = nodes.item(i);
      .........
      .........
      CoreDescriptor p = new CoreDescriptor(this, name, DOMUtil.getAttr(node, "instanceDir"null));
      .........
      .........
 
      SolrCore core = create(p);//each Core create and initialize here. All important features will create
      register(name, core, false);
      .........
      .........
    }

Core created but did not register. The CoreContainer.register(String name, SolrCore core, boolean returnPrevNotClosed) will register the core to zkController. above register(name, core, false) do this job. At the same time the register(name, core, false) will publice the core status to overseer. register(name, core, false) will call ZkController.register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) to update this core's cloud status, including join leaderElection line and so on.

public String register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) throws Exception {
........
........
joinElection(desc);
........
........
if (!core.isReloaded() && ulog != null) {//recover From Log if core is not reload
          Future<UpdateLog.RecoveryInfo> recoveryFuture = core.getUpdateHandler()
              .getUpdateLog().recoverFromLog();
          .......
}
..........
boolean didRecovery = checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc,
            collection, coreZkNodeName, shardId, leaderProps, core, cc);
        if (!didRecovery) {
          publish(desc, ZkStateReader.ACTIVE);
        }
..........
 
    zkStateReader.updateCloudState(true);
    return shardId;
  }

1. zkController.joinElection(desc) decide whether this core is a leader. if it's a leader then call runIamLeader() else start a watcher to watch the former core's status. thezkController.joinElection(desc) call LeaderElector.joinElection(context)  as follow:  

public int joinElection(ElectionContext context) throws KeeperException, InterruptedException, IOException {
......
int seq = getSeq(leaderSeqPath);
checkIfIamLeader(seq, context, false);
.......
}

then LeaderElector.checkIfIamLeader(seq, context, false):

/**

   * Check if the candidate with the given n_* sequence number is the leader.

   * If it is, set the leaderId on the leader zk node. If it is not, start

   * watching the candidate that is in line before this one - if it goes down, check

   * if this candidate is the leader again.

   **/

private void checkIfIamLeader(final int seq, final ElectionContext context, boolean replacement) throws KeeperException,
      InterruptedException, IOException {
    // get all other numbers...
    final String holdElectionPath = context.electionPath + ELECTION_NODE;
    List<String> seqs = zkClient.getChildren(holdElectionPath, nulltrue);
 
    sortSeqs(seqs);
    List<Integer> intSeqs = getSeqs(seqs);
    if (seq <= intSeqs.get(0)) {
      runIamLeaderProcess(context, replacement);
    else {
      // I am not the leader − watch the node below me
      int i = 1;
      for (; i < intSeqs.size(); i++) {
        int s = intSeqs.get(i);
        if (seq < s) {
          // we found who we come before − watch the guy in front
          break;
        }
      }
      int index = i − 2;
      if (index < 0) {
        log.warn("Our node is no longer in line to be leader");
        return;
      }
      try {
        zkClient.getData(holdElectionPath + "/" + seqs.get(index),
            new Watcher() {
 
              @Override
              public void process(WatchedEvent event) {
                // am I the next leader?
                try {
                  checkIfIamLeader(seq, context, true);
                catch (InterruptedException e) {
                  // Restore the interrupted status
                  Thread.currentThread().interrupt();
                  log.warn("", e);
                catch (IOException e) {
                  log.warn("", e);
                catch (Exception e) {
                  log.warn("", e);
                }
              }
 
            }, nulltrue);
      catch (KeeperException.SessionExpiredException e) {
        throw e;
      catch (KeeperException e) {
        // we couldn't set our watch − the node before us may already be down?
        // we need to check if we are the leader again
        checkIfIamLeader(seq, context, true);
      }
    }
  }

2. for core.getUpdateHandler().getUpdateLog().recoverFromLog(); this will get the UpdateLog from DirectUodateHandler2UpdateLog call the recoverFromLog() function. this call will start a new thread to replay local updateLog belong to local machine. recoverFromLog() recover the local transation log primarily. the UpdateLog.recoverFromLog() as below:

public Future<RecoveryInfo> recoverFromLog() {
    recoveryInfo = new RecoveryInfo();
 
    List<TransactionLog> recoverLogs = new ArrayList<TransactionLog>(1);
    for (TransactionLog ll : newestLogsOnStartup) {
      if (!ll.try_incref()) continue;
 
      try {
        if (ll.endsWithCommit()) {
          ll.decref();
          continue;
        }
      catch (IOException e) {
        log.error("Error inspecting tlog " + ll);
        ll.decref();
        continue;
      }
 
      recoverLogs.add(ll);
    }
 
    if (recoverLogs.isEmpty()) return null;
 
    ExecutorCompletionService<RecoveryInfo> cs = new ExecutorCompletionService<RecoveryInfo>(recoveryExecutor);
    LogReplayer replayer = new LogReplayer(recoverLogs, false);
 
    versionInfo.blockUpdates();
    try {
      state = State.REPLAYING;
    finally {
      versionInfo.unblockUpdates();
    }
 
    // At this point, we are guaranteed that any new updates coming in will see the state as "replaying"
 
    return cs.submit(replayer, recoveryInfo);
  }

3. for ZkController.checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc,  collection, coreZkNodeName, shardId, leaderProps, core, cc) ,it's a distributed recovery. This process will not do if this core is a leader, or will do recovery. The function will start a new thread named RecoveryStrategy, and this thread is the job holder.  

  1.   If this is the first time, try to recovery from the PeerSync.sync(). this action will try to recovery form leader's updateLog. Turn to step 2 if failed
  2.  do distributed recovery.  RecoveryStrategy.replicate(String nodeName, SolrCore core, ZkNodeProps leaderprops, String baseUrl) will call ReplicationHandler.doFetch()  to fetch index files from leader and try to recovery from those files.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值