This page show the major procedures in the progress of Solr4.0 startup
SolrDispatchFilter.init(FilterConfig config) init the CoreContainer firstly.
public void init(FilterConfig config) throws ServletException { ........... CoreContainer.Initializer init = createInitializer(); ........... this .cores = init.initialize(); .......... } |
then CoreContainer.Initalizer.initializer() call the CoreContainer.load()
public CoreContainer initialize() throws IOException, ParserConfigurationException, SAXException { CoreContainer cores = null ; String solrHome = SolrResourceLoader.locateSolrHome(); File fconf = new File(solrHome, containerConfigFilename == null ? "solr.xml" : containerConfigFilename); cores = new CoreContainer(solrHome); if (fconf.exists()) { cores.load(solrHome, fconf); } else { log.info( "no solr.xml file found − using default" ); cores.load(solrHome, new InputSource( new ByteArrayInputStream(DEF_SOLR_XML.getBytes( "UTF−8" )))); cores.configFile = fconf; } containerConfigFilename = cores.getConfigFile().getName(); return cores; } } |
CoreContainer.load(solrHome,
fconf) call CoreContainer.load(String dir, InputSource cfgis). This function is the most important part for Solr4.0's startup. Many members of CoreContainer initialize here, including OverSeer, ZkCotroller,CoreAdminHandler and CollectionHandler. Now we go
in to this function
.......... initZooKeeper(zkHost, zkClientTimeout); //this calling will initialize the zkControler .......... coreAdminHandler = new CoreAdminHandler( this ); .......... .......... NodeList nodes = (NodeList)cfg.evaluate( "solr/cores/core" , XPathConstants.NODESET); //got croe config info from solr.xml for ( int i= 0 ; i<nodes.getLength(); i++) { Node node = nodes.item(i); ......... ......... CoreDescriptor p = new CoreDescriptor( this , name, DOMUtil.getAttr(node, "instanceDir" , null )); ......... ......... SolrCore core = create(p); //each Core create and initialize here. All important features will create register(name, core, false ); ......... ......... } |
Core created but did not register. The CoreContainer.register(String name, SolrCore core, boolean returnPrevNotClosed) will register the core to zkController. above register(name, core, false) do this job. At the same time the register(name, core, false) will publice the core status to overseer. register(name, core, false) will call ZkController.register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) to update this core's cloud status, including join leaderElection line and so on.
public String register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) throws Exception { ........ ........ joinElection(desc); ........ ........ if (!core.isReloaded() && ulog != null ) { //recover From Log if core is not reload Future<UpdateLog.RecoveryInfo> recoveryFuture = core.getUpdateHandler() .getUpdateLog().recoverFromLog(); ....... } .......... boolean didRecovery = checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc, collection, coreZkNodeName, shardId, leaderProps, core, cc); if (!didRecovery) { publish(desc, ZkStateReader.ACTIVE); } .......... zkStateReader.updateCloudState( true ); return shardId; } |
1. zkController.joinElection(desc) decide whether this core is a leader. if it's a leader then call runIamLeader() else start a watcher to watch the former core's status. thezkController.joinElection(desc) call LeaderElector.joinElection(context) as follow:
public int joinElection(ElectionContext context) throws KeeperException, InterruptedException, IOException { ...... int seq = getSeq(leaderSeqPath); checkIfIamLeader(seq, context, false ); ....... } |
then LeaderElector.checkIfIamLeader(seq, context, false):
/**
* Check if the candidate with the given n_* sequence number is the leader.
* If it is, set the leaderId on the leader zk node. If it is not, start
* watching the candidate that is in line before this one - if it goes down, check
* if this candidate is the leader again.
**/
private void checkIfIamLeader( final int seq, final ElectionContext context, boolean replacement) throws KeeperException, InterruptedException, IOException { // get all other numbers... final String holdElectionPath = context.electionPath + ELECTION_NODE; List<String> seqs = zkClient.getChildren(holdElectionPath, null , true ); sortSeqs(seqs); List<Integer> intSeqs = getSeqs(seqs); if (seq <= intSeqs.get( 0 )) { runIamLeaderProcess(context, replacement); } else { // I am not the leader − watch the node below me int i = 1 ; for (; i < intSeqs.size(); i++) { int s = intSeqs.get(i); if (seq < s) { // we found who we come before − watch the guy in front break ; } } int index = i − 2 ; if (index < 0 ) { log.warn( "Our node is no longer in line to be leader" ); return ; } try { zkClient.getData(holdElectionPath + "/" + seqs.get(index), new Watcher() { @Override public void process(WatchedEvent event) { // am I the next leader? try { checkIfIamLeader(seq, context, true ); } catch (InterruptedException e) { // Restore the interrupted status Thread.currentThread().interrupt(); log.warn( "" , e); } catch (IOException e) { log.warn( "" , e); } catch (Exception e) { log.warn( "" , e); } } }, null , true ); } catch (KeeperException.SessionExpiredException e) { throw e; } catch (KeeperException e) { // we couldn't set our watch − the node before us may already be down? // we need to check if we are the leader again checkIfIamLeader(seq, context, true ); } } } |
2. for core.getUpdateHandler().getUpdateLog().recoverFromLog(); this will get the UpdateLog from DirectUodateHandler2. UpdateLog call the recoverFromLog() function. this call will start a new thread to replay local updateLog belong to local machine. recoverFromLog() recover the local transation log primarily. the UpdateLog.recoverFromLog() as below:
public Future<RecoveryInfo> recoverFromLog() { recoveryInfo = new RecoveryInfo(); List<TransactionLog> recoverLogs = new ArrayList<TransactionLog>( 1 ); for (TransactionLog ll : newestLogsOnStartup) { if (!ll.try_incref()) continue ; try { if (ll.endsWithCommit()) { ll.decref(); continue ; } } catch (IOException e) { log.error( "Error inspecting tlog " + ll); ll.decref(); continue ; } recoverLogs.add(ll); } if (recoverLogs.isEmpty()) return null ; ExecutorCompletionService<RecoveryInfo> cs = new ExecutorCompletionService<RecoveryInfo>(recoveryExecutor); LogReplayer replayer = new LogReplayer(recoverLogs, false ); versionInfo.blockUpdates(); try { state = State.REPLAYING; } finally { versionInfo.unblockUpdates(); } // At this point, we are guaranteed that any new updates coming in will see the state as "replaying" return cs.submit(replayer, recoveryInfo); } |
3. for ZkController.checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc, collection, coreZkNodeName, shardId, leaderProps, core, cc) ,it's a distributed recovery. This process will not do if this core is a leader, or will do recovery. The function will start a new thread named RecoveryStrategy, and this thread is the job holder.
- If this is the first time, try to recovery from the PeerSync.sync(). this action will try to recovery form leader's updateLog. Turn to step 2 if failed
- do distributed recovery. RecoveryStrategy.replicate(String nodeName, SolrCore core, ZkNodeProps leaderprops, String baseUrl) will call ReplicationHandler.doFetch() to fetch index files from leader and try to recovery from those files.