Elasticsearch集群未分配的shard

Elasticsearch集群：解决未分配shard问题

最新推荐文章于 2025-06-04 21:47:27 发布

原创最新推荐文章于 2025-06-04 21:47:27 发布 · 1.4w 阅读

2 ·

CC 4.0 BY-SA版权

Elasticsearch 专栏收录该内容

14 篇文章

订阅专栏

本文介绍了Elasticsearch集群中未分配shard的情况，特别是当replica shard未分配导致集群呈现Yellow状态时，分析了黄色状态的影响以及如何通过GET /_cat/shards API检查问题。如果shard长时间未分配，可能由于节点上的索引文件损坏，需要人工干预，如删除损坏文件。同时，建议监控集群健康状态，以便及时发现和解决问题。

Elasticsearch的主master节点管理shard在数据节点间的分配，如果有足够多的数据节点，它自动分配shard（primary和replica）到相应的数据节点上。但某些特殊的情况下，也会有未分配shard。如果未分配的是 replica shard，则整个集群处于Yellow状态。在你有足够的replica shard备份的情况下， yellow并不影响集群的整体可用性（但搜索性能可能会有下降），而且很多时候可以自动恢复，不需要任何人工干预，比如：某个数据节点的系统在打补丁或者系统维护时会被自动重新启动。

但如果shard长时间处于未分配状态，则需要特别注意了，往往需要人工干预。例如：节点上索引文件损坏。通过GET /_cat/shards API 查看哪个节点存在未分配的shard，在节点日志文件中会发现如下的内容。造成这种情况的原因可能有多种，直接删除损坏的文件既可以解决问题。

[2014-12-22 16:03:27,347][WARN ][index.engine.internal    ] [ES-10-data] [ppub-v5][22] failed engine [corrupted preexisting index]
[2014-12-22 16:03:27,347][WARN ][indices.cluster          ] [ES-10-data] [ppub-v5][22] failed to start shard
org.apache.lucene.index.CorruptIndexException: [ppub-v5][22] Corrupted index [corrupted_Ccq5Hd42SuGbQWk_aB9CFw] caused by: CorruptIndexException[codec footer mismatch: actual footer=817092787 vs expected footer=-1071082520 (resource: MMapIndexInput(path="D:\IndexStorage\Data\search-cluster\nodes\0\indices\ppub-v5\22\index\_48jet.fdt"))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:727)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:580)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:184)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2014-12-22 16:03:27,347][WARN ][cluster.action.shard     ] [ES-10-data] [ppub-v5][22] sending failed shard for [ppub-v5][22], node[pK7CwHokQKCSk8TEVlCsWA], [R], s[INITIALIZING], indexUUID [4QlLH670Q_Kd6rb1WXhgaQ], reason [Failed to start shard, message [CorruptIndexException[[ppub-v5][22] Corrupted index [corrupted_Ccq5Hd42SuGbQWk_aB9CFw] caused by: CorruptIndexException[codec footer mismatch: actual footer=817092787 vs expected footer=-1071082520 (resource: MMapIndexInput(path="D:\IndexStorage\Data\search-cluster\nodes\0\indices\ppub-v5\22\index\_48jet.fdt"))]]]]
[2014-12-22 16:03:27,347][WARN ][cluster.action.shard     ] [ES-10-data] [ppub-v5][22] sending failed shard for [ppub-v5][22], node[pK7CwHokQKCSk8TEVlCsWA], [R], s[INITIALIZING], indexUUID [4QlLH670Q_Kd6rb1WXhgaQ], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[ppub-v5][22] Corrupted index [corrupted_Ccq5Hd42SuGbQWk_aB9CFw] caused by: CorruptIndexException[codec footer mismatch: actual footer=817092787 vs expected footer=-1071082520 (resource: MMapIndexInput(path="D:\IndexStorage\Data\search-cluster\nodes\0\indices\ppub-v5\22\index\_48jet.fdt"))]]]]

在生产环境使用Elasticsearch集群，一般都会有相应监测机制自动监视集群的状态，如：调用 GET /_cluster/health API 。如果返回的状态是Red，则需要马上有人响应；如果是Yellow并长时间处于该状态，也需要有人及时查看。Elasticsearch的日志文件是分析和查找Elasticsearch集群问题的首选，对于稍复杂的问题你需要查看master节点、client节点以及数据节点上的日志文件。