Elasticsearch集群未分配的shard

Elasticsearch集群:解决未分配shard问题
本文介绍了Elasticsearch集群中未分配shard的情况,特别是当replica shard未分配导致集群呈现Yellow状态时,分析了黄色状态的影响以及如何通过GET /_cat/shards API检查问题。如果shard长时间未分配,可能由于节点上的索引文件损坏,需要人工干预,如删除损坏文件。同时,建议监控集群健康状态,以便及时发现和解决问题。

      Elasticsearch的主master节点管理shard在数据节点间的分配,如果有足够多的数据节点,它自动分配shard(primary和replica)到相应的数据节点上。但某些特殊的情况下,也会有未分配shard。如果未分配的是 replica shard,则整个集群处于Yellow状态。在你有足够的replica shard备份的情况下, yellow并不影响集群的整体可用性(但搜索性能可能会有下降),而且很多时候可以自动恢复,不需要任何人工干预,比如:某个数据节点的系统在打补丁或者系统维护时会被自动重新启动。

       但如果shard长时间处于未分配状态,则需要特别注意了,往往需要人工干预。例如:节点上索引文件损坏。通过GET /_cat/shards API 查看哪个节点存在未分配的shard,在节点日志文件中会发现如下的内容。造成这种情况的原因可能有多种,直接删除损坏的文件既可以解决问题。


[2014-12-22 16:03:27,347][WARN ][index.engine.internal    ] [ES-10-data] [ppub-v5][22] failed engine [corrupted preexisting index]
[2014-12-22 16:03:27,347][WARN ][indices.cluster          ] [ES-10-data] [ppub-v5][22] failed to start shard
org.apache.lucene.index.CorruptIndexException: [ppub-v5][22] Corrupted index [corrupted_Ccq5Hd42SuGbQWk_aB9CFw] caused by: CorruptIndexException[codec footer mismatch: actual footer=817092787 vs expected footer=-1071082520 (resource: MMapIndexInput(path="D:\IndexStorage\Data\search-cluster\nodes\0\indices\ppub-v5\22\index\_48jet.fdt"))]
 at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
 at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
 at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:727)
 at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
 at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
 at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
 at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
[2014-12-22 16:03:27,347][WARN ][cluster.action.shard     ] [ES-10-data] [ppub-v5][22] sending failed shard for [ppub-v5][22], node[pK7CwHokQKCSk8TEVlCsWA], [R], s[INITIALIZING], indexUUID [4QlLH670Q_Kd6rb1WXhgaQ], reason [Failed to start shard, message [CorruptIndexException[[ppub-v5][22] Corrupted index [corrupted_Ccq5Hd42SuGbQWk_aB9CFw] caused by: CorruptIndexException[codec footer mismatch: actual footer=817092787 vs expected footer=-1071082520 (resource: MMapIndexInput(path="D:\IndexStorage\Data\search-cluster\nodes\0\indices\ppub-v5\22\index\_48jet.fdt"))]]]]
[2014-12-22 16:03:27,347][WARN ][cluster.action.shard     ] [ES-10-data] [ppub-v5][22] sending failed shard for [ppub-v5][22], node[pK7CwHokQKCSk8TEVlCsWA], [R], s[INITIALIZING], indexUUID [4QlLH670Q_Kd6rb1WXhgaQ], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[ppub-v5][22] Corrupted index [corrupted_Ccq5Hd42SuGbQWk_aB9CFw] caused by: CorruptIndexException[codec footer mismatch: actual footer=817092787 vs expected footer=-1071082520 (resource: MMapIndexInput(path="D:\IndexStorage\Data\search-cluster\nodes\0\indices\ppub-v5\22\index\_48jet.fdt"))]]]]


       在生产环境使用Elasticsearch集群,一般都会有相应监测机制自动监视集群的状态,如:调用 GET /_cluster/health API 。如果返回的状态是Red,则需要马上有人响应;如果是Yellow并长时间处于该状态,也需要有人及时查看。Elasticsearch的日志文件是分析和查找Elasticsearch集群问题的首选,对于稍复杂的问题你需要查看master节点、client节点以及数据节点上的日志文件。




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值