选举集群状态
es中存储的数据有一下几种,state元数据、lucene索引文件、translog事务日志
元数据信息可以分为:
- 集群层面的元信息-对应着metaData数据结构,主要是clusterUUid、settings、templates等
- 索引层面的元信息-对应着indexMetaData数据结构,主要存储分片数量、mappings索引字段映射等
- 分片层面的元信息-对应着shardStateMetaData,主要是version、indexUUid、主分片等
每个节点可能会有不同的集群状态,需要选择正确的元数据作为权威源数据。状态信息的管理在gatewayService中,它实现了ClusterStateListener接口,当选择完主节点后会发布一个集群状态task,触发回调方法clusterChanged
//恢复分片分配状态
performStateRecovery(enforceRecoverAfterTime, reason);
集群层和索引层元数据恢复在gateway模块完成
public void clusterChanged(final ClusterChangedEvent event) {
if (lifecycle.stoppedOrClosed()) {
return;
}
final ClusterState state = event.state();
//只有主节点才能执行
if (state.nodes().isLocalNodeElectedMaster() == false) {
// not our job to recover
return;
}
//已经执行过了集群状态和索引状态恢复了
if (state.blocks().hasGlobalBlock(STATE_NOT_RECOVERED_BLOCK) == false) {
// already recovered
return;
}
//这段省略主要是检查是否达到恢复状态条件
......
//恢复状态
performStateRecovery(enforceRecoverAfterTime, reason);
}
首先判断只有主节点可以执行状态选举,然后判断是否已经在执行了状态恢复任务了,如果是则直接返回;如果没有则执行恢复状态任务
最终会调用recoveryRunnable.run()
final Gateway gateway = new Gateway(settings, clusterService, listGatewayMetaState);
recoveryRunnable = () ->
gateway.performStateRecovery(new GatewayRecoveryListener());
执行gateway的performStateRecovery方法
首先回去所有master资格的节点信息
//具有master资格的node节点
final String[] nodesIds = clusterService.state().nodes().getMasterNodes().keys().toArray(String.class);
获取其他master节点的元数据
//获取集群及信息
final TransportNodesListGatewayMetaState.NodesGatewayMetaState nodesState = listGatewayMetaState.list(nodesIds, null).actionGet();
这里我们看下TransportNodesListGatewayMetaState的构造函数
public TransportNodesListGatewayMetaState(ThreadPool threadPool, ClusterService clusterService, TransportService transportService,
ActionFilters actionFilters, GatewayMetaState metaState) {
super(ACTION_NAME, threadPool, clusterService, transportService, actionFilters,
Request::new, NodeRequest::new, ThreadPool.Names.GENERIC, NodeGatewayMetaState.class);
this.metaState = metaState;
}
//注册action处理类
transportService.registerRequestHandler(actionName, executor, false, canTripCircuitBreaker, requestReader,new TransportHandler());
回到list方法,会调用doExecute方法
public ActionFuture<NodesGatewayMetaState> list(String[] nodesIds, @Nullable TimeValue timeout) {
PlainActionFuture<NodesGatewayMetaState> future = PlainActionFuture.newFuture();
execute(new Request(nodesIds).timeout(timeout), future);
return future;
}
protected void doExecute(Task task, NodesRequest request, ActionListener<NodesResponse> listener) {
//执行
new AsyncAction(task, request, listener).start();
}
发送所有节点获取元数据
void start() {
final DiscoveryNode[] nodes = request.concreteNodes();
if (nodes.length == 0) {
//没有需要获取数据的node
// nothing to notify
threadPool.generic().execute(() -> listener.onResponse(newResponse(request, responses)));
return;
}
TransportRequestOptions.Builder builder = TransportRequestOptions.builder()