[代码] solrcloud/solr4.0的启动步骤

最新推荐文章于 2019-08-30 13:35:35 发布

amongdata

最新推荐文章于 2019-08-30 13:35:35 发布

阅读量1.5k

点赞数

CC 4.0 BY-SA版权

分类专栏：搜索引擎文章标签： solrcloud solr4.0 代码分析

本文链接：https://blog.youkuaiyun.com/porui/article/details/9255345

搜索引擎专栏收录该内容

16 篇文章

订阅专栏

本文详细解析了Solr 4.0启动过程中的关键步骤，包括核心容器初始化、Zookeeper集成、领导者选举机制及本地与分布式恢复流程等。

This page show the major procedures in the progress of Solr4.0 startup

SolrDispatchFilter.init(FilterConfig config) init the CoreContainer firstly.

public void init(FilterConfig config) throws ServletException

{

...........

CoreContainer.Initializer init = createInitializer();

...........

this.cores = init.initialize();

..........

}

then CoreContainer.Initalizer.initializer() call the CoreContainer.load()

public CoreContainer initialize() throws IOException,

ParserConfigurationException, SAXException {

CoreContainer cores = null;

String solrHome = SolrResourceLoader.locateSolrHome();

File fconf = new File(solrHome, containerConfigFilename == null ? "solr.xml"

: containerConfigFilename);

cores = new CoreContainer(solrHome);

if (fconf.exists()) {

cores.load(solrHome, fconf);

} else {

log.info("no solr.xml file found − using default");

cores.load(solrHome, new InputSource(new ByteArrayInputStream(DEF_SOLR_XML.getBytes("UTF−8"))));

cores.configFile = fconf;

}

containerConfigFilename = cores.getConfigFile().getName();

return cores;

}

CoreContainer.load(solrHome, fconf) call CoreContainer.load(String dir, InputSource cfgis). This function is the most important part for Solr4.0's startup. Many members of CoreContainer initialize here, including OverSeer, ZkCotroller,CoreAdminHandler and CollectionHandler. Now we go in to this function

..........

initZooKeeper(zkHost, zkClientTimeout);//this calling will initialize the zkControler

..........

coreAdminHandler = new CoreAdminHandler(this);

..........

NodeList nodes = (NodeList)cfg.evaluate("solr/cores/core", XPathConstants.NODESET); //got croe config info from solr.xml

for (int i=0; i<nodes.getLength(); i++) {

Node node = nodes.item(i);

.........

CoreDescriptor p = new CoreDescriptor(this, name, DOMUtil.getAttr(node, "instanceDir", null));

.........

SolrCore core = create(p);//each Core create and initialize here. All important features will create

register(name, core, false);

.........

}

Core created but did not register. The CoreContainer.register(String name, SolrCore core, boolean returnPrevNotClosed) will register the core to zkController. above register(name, core, false) do this job. At the same time the register(name, core, false) will publice the core status to overseer. register(name, core, false) will call ZkController.register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) to update this core's cloud status, including join leaderElection line and so on.

public String register(String coreName, final CoreDescriptor desc, boolean recoverReloadedCores) throws Exception {

........

joinElection(desc);

........

if (!core.isReloaded() && ulog != null) {//recover From Log if core is not reload

Future<UpdateLog.RecoveryInfo> recoveryFuture = core.getUpdateHandler()

.getUpdateLog().recoverFromLog();

.......

}

..........

boolean didRecovery = checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc,

collection, coreZkNodeName, shardId, leaderProps, core, cc);

if (!didRecovery) {

publish(desc, ZkStateReader.ACTIVE);

}

..........

zkStateReader.updateCloudState(true);

return shardId;

}

1. zkController.joinElection(desc) decide whether this core is a leader. if it's a leader then call runIamLeader() else start a watcher to watch the former core's status. thezkController.joinElection(desc) call LeaderElector.joinElection(context) as follow:

public int joinElection(ElectionContext context) throws KeeperException, InterruptedException, IOException {

......

int seq = getSeq(leaderSeqPath);

checkIfIamLeader(seq, context, false);

.......

}

then LeaderElector.checkIfIamLeader(seq, context, false):

/**

* Check if the candidate with the given n_* sequence number is the leader.

* If it is, set the leaderId on the leader zk node. If it is not, start

* watching the candidate that is in line before this one - if it goes down, check

* if this candidate is the leader again.

**/

private void checkIfIamLeader(final int seq, final ElectionContext context, boolean replacement) throws KeeperException,

InterruptedException, IOException {

// get all other numbers...

final String holdElectionPath = context.electionPath + ELECTION_NODE;

List<String> seqs = zkClient.getChildren(holdElectionPath, null, true);

sortSeqs(seqs);

List<Integer> intSeqs = getSeqs(seqs);

if (seq <= intSeqs.get(0)) {

runIamLeaderProcess(context, replacement);

} else {

// I am not the leader − watch the node below me

int i = 1;

for (; i < intSeqs.size(); i++) {

int s = intSeqs.get(i);

if (seq < s) {

// we found who we come before − watch the guy in front

break;

}

int index = i − 2;

if (index < 0) {

log.warn("Our node is no longer in line to be leader");

return;

}

try {

zkClient.getData(holdElectionPath + "/" + seqs.get(index),

new Watcher() {

@Override

public void process(WatchedEvent event) {

// am I the next leader?

try {

checkIfIamLeader(seq, context, true);

} catch (InterruptedException e) {

// Restore the interrupted status

Thread.currentThread().interrupt();

log.warn("", e);

} catch (IOException e) {

log.warn("", e);

} catch (Exception e) {

log.warn("", e);

}

}, null, true);

} catch (KeeperException.SessionExpiredException e) {

throw e;

} catch (KeeperException e) {

// we couldn't set our watch − the node before us may already be down?

// we need to check if we are the leader again

checkIfIamLeader(seq, context, true);

}

2. for core.getUpdateHandler().getUpdateLog().recoverFromLog(); this will get the UpdateLog from DirectUodateHandler2. UpdateLog call the recoverFromLog() function. this call will start a new thread to replay local updateLog belong to local machine. recoverFromLog() recover the local transation log primarily. the UpdateLog.recoverFromLog() as below:

public Future<RecoveryInfo> recoverFromLog() {

recoveryInfo = new RecoveryInfo();

List<TransactionLog> recoverLogs = new ArrayList<TransactionLog>(1);

for (TransactionLog ll : newestLogsOnStartup) {

if (!ll.try_incref()) continue;

try {

if (ll.endsWithCommit()) {

ll.decref();

continue;

}

} catch (IOException e) {

log.error("Error inspecting tlog " + ll);

ll.decref();

continue;

}

recoverLogs.add(ll);

}

if (recoverLogs.isEmpty()) return null;

ExecutorCompletionService<RecoveryInfo> cs = new ExecutorCompletionService<RecoveryInfo>(recoveryExecutor);

LogReplayer replayer = new LogReplayer(recoverLogs, false);

versionInfo.blockUpdates();

try {

state = State.REPLAYING;

} finally {

versionInfo.unblockUpdates();

}

// At this point, we are guaranteed that any new updates coming in will see the state as "replaying"

return cs.submit(replayer, recoveryInfo);

}

3. for ZkController.checkRecovery(coreName, desc, recoverReloadedCores, isLeader, cloudDesc, collection, coreZkNodeName, shardId, leaderProps, core, cc) ,it's a distributed recovery. This process will not do if this core is a leader, or will do recovery. The function will start a new thread named RecoveryStrategy, and this thread is the job holder.

If this is the first time, try to recovery from the PeerSync.sync(). this action will try to recovery form leader's updateLog. Turn to step 2 if failed
do distributed recovery. RecoveryStrategy.replicate(String nodeName, SolrCore core, ZkNodeProps leaderprops, String baseUrl) will call ReplicationHandler.doFetch() to fetch index files from leader and try to recovery from those files.