ES集群状态为红色但没有处于Unassigned状态的主分片问题排查

现象

集群启动时,集群状态为红色,没有处于Unassigned状态的主分片。
冲突点:官方文档中描述红色代表集群中有主分片处于Unassigned状态,但集群中并没有主分片处于Unassigned状态,却为红色。原文如下

One or more primary shards are unassigned, so some data is unavailable.

调查

TransportClusterHealthAction#clusterHealth代码返回集群的健康状态,ClusterStateHealth中维护了集群健康相关信息,在构造方法中根据所有索引的健康状态(red yellow green)确定集群的健康状态,取最差索引的健康状态;索引的健康状态维护在ClusterIndexHealth中,是根据分片的状态进行确定的,全部主分片为active状态则为绿色,active的判断标准是分片为started或relocating状态

     int computeActiveShards = 0;
        int computeRelocatingShards = 0;
        int computeInitializingShards = 0;
        int computeUnassignedShards = 0;
        for (ShardRouting shardRouting : shardRoutingTable) {
            if (shardRouting.active()) {
                computeActiveShards++;
                if (shardRouting.relocating()) {
                    // the shard is relocating, the one it is relocating to will be in initializing state, so we don't count it
                    computeRelocatingShards++;
                }
            } else if (shardRouting.initializing()) {
                computeInitializingShards++;
            } else if (shardRouting.unassigned()) {
                computeUnassignedShards++;
            }
        }
        ClusterHealthStatus computeStatus;
        final ShardRouting primaryRouting = shardRoutingTable.primaryShard();
        if (primaryRouting.active()) {
            if (computeActiveShards == shardRoutingTable.size()) {
                computeStatus = ClusterHealthStatus.GREEN;
            } else {
                computeStatus = ClusterHealthStatus.YELLOW;
            }
        } else {
            computeStatus = getInactivePrimaryHealth(primaryRouting);
        }

  public boolean active() {
        return started() || relocating();
    }
/**
     * The shard is not assigned to any node.
     */
    UNASSIGNED((byte) 1),
    /**
     * The shard is initializing (probably recovering from either a peer shard
     * or gateway).
     */
    INITIALIZING((byte) 2),
    /**
     * The shard is started.
     */
    STARTED((byte) 3),
    /**
     * The shard is in the process being relocated.
     */
    RELOCATING((byte) 4);

结论

只有所有的主分片均处于started或relocating状态,集群才会为黄色。现象中集群为红色的原因是有分片处于INITIALIZING状态。

备注:当source节点的分片处于relocating,那么target节点的同个分片处于INITIALIZING。INITIALIZING状态可能是节点从其他节点恢复(relocating、replica copy)、snapshot恢复或者从本地恢复

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值