es重启临时关闭自动分片

仙道Bob

已于 2023-05-12 19:12:10 修改

阅读量3.8k

点赞数 2

CC 4.0 BY-SA版权

分类专栏： elasticsearch 文章标签： elasticsearch 大数据搜索引擎

于 2022-08-31 02:22:58 首次发布

本文链接：https://blog.youkuaiyun.com/jsbylibo/article/details/126616031

elasticsearch 专栏收录该内容

8 篇文章

订阅专栏

ElasticSearch 集群的高可用和自平衡方案会在节点挂掉（重启）后自动在别的结点上复制该结点的分片，这将导致了大量的IO和网络开销。
如果离开的节点重新加入集群，elasticsearch为了对数据分片(shard)进行再平衡，会为重新加入的节点再次分配数据分片(Shard)；当一台es因为压力过大而挂掉以后，其他的es服务会备份本应那台es保存的数据，造成更大压力，于是整个集群会发生雪崩。
生产环境的 ElasticSearch 服务如果负载过重，单台服务器不稳定；则集群稳定性就会因为自动平衡机制，再遭重创。生产环境下建议关闭自动平衡。

一、数据分片与自平衡配置

1.1、关闭自动分片，即使新建index也无法分配数据分片

生产中设置这个即可，集群节点重启后，未分配的分片会比较多，然后开启后会自动重新分配。(可以先设置一部分节点，然后开启，再设置一部分，再开启)

curl -XPUT http://192.168.1.213:9200/_cluster/settings -d '{
  "transient" : {
    "cluster.routing.allocation.enable" : "none"
  }
}'

官方说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html

cluster.routing.allocation.enable

(Dynamic) Enable or disable allocation for specific kinds of shards:

all - (default) Allows shard allocation for all kinds of shards.
primaries - Allows shard allocation only for primary shards.
new_primaries - Allows shard allocation only for primary shards for new indices.
none - No shard allocations of any kind are allowed for any indices.

This setting does not affect the recovery of local primary shards when restarting a node. A restarted node that has a copy of an unassigned primary shard will recover that primary immediately, assuming that its allocation id matches one of the active allocation ids in the cluster state.

1.2、关闭自动平衡，只在增减ES节点时不自动平衡数据分片

curl -XPUT http://192.168.1.213:9200/_cluster/settings?pretty -d '{
  "transient" : {
    "cluster.routing.rebalance.enable" : "none"
  }
}'

1.3、设置完以后查看设置是否添加成功

curl http://192.168.1.213:9200/_cluster/settings?pretty

1.4、适当调大节点分片并发数(慎重，容易蹦)

"cluster.routing.allocation.node_concurrent_recoveries":2
默认是2，可以适当大一些

cluster.routing.allocation.node_concurrent_recoveries

(Dynamic) A shortcut to set both cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries. Defaults to 2.

1.5、打开自动分片，自动平衡（如果关闭的话）

curl -XPUT http://192.168.1.213:9200/_cluster/settings -d '{
  "transient" : {
    "cluster.routing.allocation.enable" : "all",
    "cluster.routing.rebalance.enable" : "all"
  }
}'

二、延迟副本的重新分配

2.1、数据恢复延迟动态配置

PUT /_all/_settings
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": "5m"
  }
}

未分配节点重新分配过程，延迟到5分钟之后。

2.2、数据恢复操作延迟静态配置

下面是修改 elasticsearch.yml 文件

gateway.recover_after_nodes: 8
这将防止Elasticsearch立即开始数据恢复，直到集群中至少有八个（数据节点或主节点）节点存在。
gateway.expected_nodes: 10 
gateway.recover_after_time: 5m
集群开始数据恢复等到5分钟后或者10个节点加入，以先到者为准。

三、脑裂问题

对某一个实例进行重启后，很有可能会导致该实例无法找到master而将自己推举为master的情况出现，为防止这种情况，需要调整 elasticsearch.yml 中的内容:

discovery.zen.minimum_master_nodes: 2

这个配置就是告诉Elasticsearch除非有足够可用的master候选节点，否则就不选举master，只有有足够可用的master候选节点才进行选举。
该设置应该始终被配置为有主节点资格的点数/2 + 1，例如:

有10个符合规则的节点数，则配置为6.
有3个则配置为2.

四、关于设置的有效性

persistent 重启后设置也会存在
transient 整个集群重启后会消失的设置

PUT /_cluster/settings
{
    "persistent" : {
        "discovery.zen.minimum_master_nodes" : 2
    }
}

五、批量重启集群

ip是集群ip list地址

#停es服务
ssh ip -C 'ps -ef|grep  org.elasticsearch.bootstrap.Elasticsearch|grep -v grep|awk '{print \$2}'|xargs kill -9'
 
#启动es服务
ssh ip -C 'su - es -c "cd /home/es/software/elasticsearch/bin;sh elasticsearch -d"'

参考：

ElasticSearch 服务节点重启需要的相关操作 | IT工程师的生活足迹

ES集群一个正确的重启流程(附串行重启es脚本)_可乐大数据的博客-优快云博客_es重启

03.shard_allocation_和_cluster的routing设置_夜月行者的博客-优快云博客

ElasticSearch集群shard均衡策略 - 知乎