Elastic Stack梳理: 生产环境部署与性能优化深度指南之从集群配置到读写调优实战

原创于 2025-12-06 22:45:00 发布 · 697 阅读

32 ·

CC 4.0 BY-SA版权

文章标签：

#性能优化

ES-Private 专栏收录该内容

15 篇文章

订阅专栏

生产环境部署关键配置

操作系统与Elasticsearch静态参数优化，在生产环境部署Elasticsearch集群时，系统级配置是首要任务

官方文档（Setup Elasticsearch）明确要求以下操作：

1 ) 禁用Swap内存：

原因：Swap基于磁盘，触发时性能急剧下降

操作：

# 永久禁用Swap
sudo swapoff -a 
sudo sed -i '/swap/s/^\(.*\)$/#\1/g' /etc/fstab 
# 在elasticsearch.yml中启用内存锁 
bootstrap.memory_lock: true

2 ) 调整文件描述符限制：

Elasticsearch需处理大量Segment文件，默认值不足

操作：

# 修改系统限制
echo "* hard nofile 65536" >> /etc/security/limits.conf

3 ) 虚拟内存优化：

Elasticsearch使用mmap映射索引文件，需增大vm.max_map_count：
```
sysctl -w vm.max_map_count=262144
```

4 ）JVM内存优化：

堆内存不超过31GB：避免指针压缩失效
内存分配比例：OS内存 ≈ 50% JVM内存，保障文件缓存效率
计算公式：节点内存 = max(30GB, 总数据量/(节点数×比例系数))
- 搜索场景比例系数：16
- 日志场景比例系数：48-96

5 ) 线程数配置：

确保/etc/security/limits.d中设置线程数≥4096

6 ) 网络参数优化

修改net.core.somaxconn和net.ipv4.tcp_max_syn_backlog提升连接处理能力
网络配置安全：
- 显式设置network.host为内网IP，禁止使用0.0.0.0
- 使用代理访问公网，避免直接暴露无认证的ES集群

关键配置示例（/etc/sysctl.conf）：

vm.swappiness = 1 
vm.max_map_count = 262144
net.core.somaxconn = 1024
net.ipv4.tcp_max_syn_backlog = 4096

Elasticsearch静态配置原则：

配置精简：elasticsearch.yml仅保留必要参数（如集群发现、网络主机），动态参数通过API调整：

# 示例配置
cluster.name: production-cluster
node.name: node-1
network.host: 192.168.1.10  # 必须用内网IP，禁用0.0.0.0
discovery.seed_hosts: ["node-1-ip", "node-2-ip"]  # Master节点IP列表
cluster.initial_master_nodes: ["node-1", "node-2"]

JVM内存设定：
- ≤31GB原则：避免JVM因大堆内存降低压缩指针效率。
- 50%内存预留：确保操作系统缓存索引文件（mmap依赖）。
- 计算公式（经验值）：
  - 搜索场景：内存/数据量 ≤ 1:16（1GB内存支持16GB数据）。
  - 日志场景：内存/数据量 ≤ 1:48~1:96。

写性能优化策略

目标：提升EPS（Events Per Second），降低I/O开销

索引刷新、Translog与分片设计的黄金法则
写性能核心指标为EPS（Events Per Second），优化需聚焦三环节：

1 ) 客户端优化：

多线程并发 + Bulk API（批量写入）
单次请求量建议10–20MB（过大引发GC，过小降低吞吐）

2 ) ES服务端优化：

机制	优化方案
Refresh	增大间隔（默认1s → 30s）或关闭（`index.refresh_interval=-1`），减少Segment生成
Translog	异步刷盘（`index.translog.durability=async`）+ 调整刷盘间隔（`index.translog.sync_interval=30s`）
Flush	调高触发阈值（`index.translog.flush_threshold_size=1gb`），避免频繁落盘
副本	写入时关闭副本（`index.number_of_replicas=0`），完成后动态恢复

2.1 Refresh机制调优：

默认1秒刷新生成新Segment，频繁操作降低吞吐量。
优化方案：
- 增大刷新间隔：index.refresh_interval: 30s（日志场景示例）。
- 关闭自动刷新（批量导入时）：index.refresh_interval: -1。

调整Index Buffer：默认10%堆内存，可静态提升：

indices.memory.index_buffer_size: 20%  # 需重启生效

2.2 Translog异步化：

默认每次写请求同步落盘（request模式），改为异步提升性能：

index.translog.durability: async
index.translog.sync_interval: 120s  # 每120秒刷盘
index.translog.flush_threshold_size: 1gb  # 超过1GB触发flush

风险：异步模式可能丢失120秒数据（需权衡可靠性）。

2.3 分片与副本设计：

临时禁用副本：写入完成后再启用。
分片均衡分配：避免单节点热点，通过index.routing.allocation.total_shards_per_node控制：
```
# 示例：5节点集群，10主分片+1副本 
index.routing.allocation.total_shards_per_node: 5  # 每个节点最多5分片
```
- 计算公式：(主分片数 + 副本数) / 节点数 + 1（预留故障转移空间）

3 ) 分片均衡配置：

避免单节点热点：index.routing.allocation.total_shards_per_node=5（N节点时设为(总分片数/N)+1）
日志场景配置示例：

PUT /logs-index  
{  
  "settings": {  
    "refresh_interval": "30s",  
    "number_of_replicas": 0,  
    "routing.allocation.total_shards_per_node": 3,  
    "translog": {  
      "durability": "async",  
      "sync_interval": "30s"  
    }  
  },  
  "mappings": {  
    "dynamic": false,  // 关闭冗余字段  
    "properties": { ... }  
  }  
}

写入机制核心原理

Elasticsearch写入流程涉及三大关键操作：

Refresh：默认1秒将内存buffer转为可查询segment
Translog：保障数据安全的预写日志
Flush：将segment持久化到磁盘并清理translog

1 ) 优化策略与实践

客户端优化：

使用bulk接口批量写入（推荐10-20MB/批次）
多线程并发写入
零副本写入：写入完成后调整副本数

服务端优化：

配置项	默认值	优化值	影响
`refresh_interval`	1s	30s-120s	减少segment生成频率
`index.memory.index_buffer_size`	10%	20-30%	增大写入缓冲区
`index.translog.durability`	request	async	异步写入translog
`index.translog.sync_interval`	5s	30s	降低磁盘同步频率
`index.translog.flush_threshold_size`	512mb	1gb	减少flush触发

2 ) 日志场景优化模板：

PUT /logs-2023-11 
{
  "settings": {
    "index": {
      "number_of_shards": 10,
      "number_of_replicas": 0,
      "refresh_interval": "30s",
      "translog": {
        "durability": "async",
        "sync_interval": "30s",
        "flush_threshold_size": "1gb"
      },
      "routing": {
        "allocation": {
          "total_shards_per_node": 3
        }
      }
    },
    "mappings": {
      "properties": {
        "@timestamp": {"type": "date"},
        "message": {"type": "text"},
        "severity": {"type": "keyword"},
        "host": {
          "type": "object",
          "properties": {
            "name": {"type": "keyword"}
          }
        }
      }
    }
  }
}

读性能优化方法论

数据建模、查询语句与分片规模的协同优化，读性能取决于四维度：

1 ) 数据建模贴合业务：

业务模型与数据模型对齐
避免Script计算：实时脚本（如Painless）无法利用倒排索引，性能极差
- 优化：预计算字段（如将price * discount存为final_price）
禁用非必要特性（如_source、doc_values）
- 字段类型精简：关闭无用特性（如doc_values: false）
合理使用嵌套类型(nested)和父子文档(join)

2 ) 分片规模与副本：

冷热数据分离（Hot-Warm架构）
索引生命周期管理(ILM)
分片大小经验值：
场景单分片最大容量
搜索 15GB
日志 50GB
副本数设定：通常=节点数-1（确保高可用，避免过多副本拖累写性能）

场景	单分片最大容量
搜索	15GB
日志	50GB

3 ) 索引配置优化：

分片数：通过压测确定
副本数：按读请求量设定（非越多越好），建议replicas = ceil(读QPS/单分片承载QPS)

# 禁用不需要的功能 
PUT /products 
{
  "settings": {
    "index": {
      "codec": "best_compression",
      "store": {
        "preload": ["nvd", "dvd"]
      }
    }
  },
  "mappings": {
    "dynamic": "strict",
    "_source": {"enabled": true},
    "properties": {
      "name": {"type": "text", "index": true},
      "price": {"type": "scaled_float", "scaling_factor": 100}
    }
  }
}

4 ) 查询语句优化：

优先Filter上下文：利用缓存，跳过算分

{
  "query": {
    "bool": {
      "filter": [{"term": {"status": "active"}}]  // 无算分，可缓存
    }
  }
}

查询语句优化：
- 优先使用filter上下文
- 禁用script、正则表达式等昂贵操作
- 避免通配符查询开头模糊匹配
- 索引分区路由(routing)
- 使用异步搜索(async search)处理复杂查询
诊断工具：
- Profile API：定位慢查询阶段
- Explain API：分析评分逻辑

诊断慢查询：

使用Profile API分析耗时阶段：

# 查询性能分析
GET /products/_search
{
  "profile": true,
  "query": {...}
}
 
# 执行计划解释
GET /products/_validate/query?explain
{
  "query": {...}
}

分片数设定科学方法

分片数量计算公式

总分片数 = max(ceil(总数据量/分片容量), ceil(总吞吐量/单分片吞吐))

容量维度：

搜索场景：分片容量 ≈ 15GB
日志场景：分片容量 ≈ 30-50GB

吞吐维度：

测试单分片性能基准：

# 测试集群配置
PUT /benchmark
{
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 0
  }
}

使用esrally压测工具获取单分片TPS/QPS

节点均衡公式：

单节点最大分片数 = (总分片数 × (副本数 + 1)) / 节点数 + 冗余因子

# 配置示例（5节点集群，10主分片+1副本）
cluster.routing.allocation.total_shards_per_node: 5

压测驱动与经验公式的双重验证
分片数 = 总数据量 / 单分片容量，需通过压测验证：

1 ) 压测流程：

搭建单节点集群，创建1主分片+0副本索引。
使用真实数据，逐步增加并发与批量大小（建议10-20MB/批次）。
工具推荐：Elasticsearch Rally（官方基准测试框架）。

2 ) 经验公式：

搜索场景：总数据量500GB → 分片数 = 500GB / 15GB ≈ 34（向上取整）
日志场景：总数据量2TB → 分片数 = 2048GB / 50GB ≈ 41

X-Pack监控实战指南

集群健康、节点负载与索引指标的全面监控
X-Pack Monitoring是官方免费插件，部署步骤：

# 安装插件
bin/elasticsearch-plugin install x-pack
bin/kibana-plugin install x-pack

# 禁用安全模块（可选）  
echo "xpack.security.enabled: false" >> config/elasticsearch.yml

核心监控指标：

仪表盘	关键指标	异常处理参考
Cluster	Query EPS / Index EPS / Latency	EPS骤降 → 检查线程池拒绝（rejections）
Nodes	JVM Heap / CPU / Segment Count	Heap持续≥75% → 扩容或内存优化
Indices	Document Count / Indexing Rate	Segment过多 → 触发Force Merge
Advanced	GC Duration / Indexing Memory	GC时长>1s → 调整JVM或排查内存泄漏

关键诊断场景：

线程池拒绝 → 扩容节点或优化写入逻辑。
Segment爆炸 → 合并（_forcemerge）或调整refresh_interval。
磁盘I/O瓶颈 → 分片均衡或升级SSD

关键监控指标：

集群级：
- 查询速率(search rate) （QPS）
- 索引速率(indexing rate) （EPS）
- 延迟指标(latency)
- 分片状态分布
- Cluster Status（Green/Yellow/Red）

节点级：

Heap Memory：持续高位需扩容
GC Duration：>1秒预示内存不足

Thread Pool Rejections：队列满时拒绝请求（需扩容节点）

{
  "jvm": {
    "mem": {"heap_used_percent": 65},
    "gc": {
      "collectors": {
        "young": {"collection_time_in_millis": 500},
        "old": {"collection_time_in_millis": 2000}
      }
    }
  },
  "thread_pool": {
    "write": {
      "threads": 16,
      "queue": 10,
      "rejected": 0
    }
  }
}

索引级：
- 文档计数(document count)
- 存储大小(store size)
- Disk Usage：超85%触发只读模式
- Segment数量(Segment Count)与内存占用过多时合并（_forcemerge）
告警阈值建议

指标警告阈值严重阈值
JVM堆内存 >75% >90%
CPU使用率 >80% >95%
磁盘空间 <30% <10%
线程池拒绝数 >10/分钟 >100/分钟
GC时间占比 >30% >50%

指标	警告阈值	严重阈值
JVM堆内存	>75%	>90%
CPU使用率	>80%	>95%
磁盘空间	<30%	<10%
线程池拒绝数	>10/分钟	>100/分钟
GC时间占比	>30%	>50%

工程示例：1

1 ) 方案1：日志采集场景优化

Elasticsearch配置：

PUT /app-logs 
{
  "settings": {
    "index.refresh_interval": "60s",
    "index.translog.durability": "async",
    "index.translog.sync_interval": "60s",
    "index.number_of_replicas": 1,
    "index.routing.allocation.total_shards_per_node": 5,
    "analysis": {
      "analyzer": {
        "log_analyzer": {
          "type": "pattern",
          "pattern": "[\\W]+"
        }
      }
    }
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "@timestamp": {"type": "date"},
      "level": {"type": "keyword"},
      "message": {
        "type": "text",
        "analyzer": "log_analyzer",
        "fields": {"raw": {"type": "keyword"}}
      }
    }
  }
}

2 ) 方案2：电商搜索场景优化

Elasticsearch配置：

PUT /ecommerce-products 
{
  "settings": {
    "index": {
      "number_of_shards": 15,
      "number_of_replicas": 2,
      "refresh_interval": "5s",
      "analysis": {
        "filter": {
          "autocomplete_filter": {
            "type": "edge_ngram",
            "min_gram": 2,
            "max_gram": 20
          }
        },
        "analyzer": {
          "autocomplete": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["lowercase", "autocomplete_filter"]
          }
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [...],
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "autocomplete",
        "search_analyzer": "standard"
      },
      "attributes": {"type": "nested"}
    }
  }
}

3 ) 方案3：时序数据处理场景

Elasticsearch配置：

PUT /metrics-2023
{
  "settings": {
    "index": {
      "number_of_shards": 20,
      "number_of_replicas": 1,
      "refresh_interval": "30s",
      "codec": "best_compression",
      "routing": {
        "allocation": {
          "require": {
            "data_type": "hot"
          }
        }
      }
    }
  },
  "mappings": {
    "_source": {"enabled": false},
    "properties": {
      "@timestamp": {"type": "date"},
      "metric_name": {"type": "keyword"},
      "value": {
        "type": "float",
        "index": false,
        "doc_values": true
      }
    }
  }
}

工程示例：2

1 ）核心模块实现

// elasticsearch.module.ts
import { Module } from '@nestjs/common';
import { ElasticsearchModule } from '@nestjs/elasticsearch';
import { ConfigService } from '@nestjs/config';
 
@Module({
  imports: [
    ElasticsearchModule.registerAsync({
      useFactory: async (config: ConfigService) => ({
        node: config.get('ES_NODE'),
        auth: {
          username: config.get('ES_USER'),
          password: config.get('ES_PASSWORD')
        },
        tls: {
          ca: config.get('ES_CA_CERT'),
          rejectUnauthorized: false 
        },
        maxRetries: 5,
        requestTimeout: 60000,
        pingTimeout: 30000 
      }),
      inject: [ConfigService]
    })
  ],
  exports: [ElasticsearchModule]
})
export class SearchModule {}

2 ）高级查询服务

// product.service.ts 
import { Injectable } from '@nestjs/common';
import { ElasticsearchService } from '@nestjs/elasticsearch';
import { SearchResponse } from '@elastic/elasticsearch/lib/api/types';
 
@Injectable()
export class ProductService {
  constructor(private readonly es: ElasticsearchService) {}
 
  async searchProducts(query: string, filters: any): Promise<SearchResponse> {
    return this.es.search({
      index: 'ecommerce-products',
      size: 100,
      track_total_hits: true,
      query: {
        bool: {
          must: [{
            multi_match: {
              query,
              fields: ['name^3', 'description'],
              type: 'cross_fields'
            }
          }],
          filter: this.buildFilters(filters)
        }
      },
      aggs: this.buildAggregations(),
      highlight: {
        fields: {
          name: {},
          description: {}
        }
      }
    });
  }
 
  private buildFilters(filters) {
    const filterArray = [];
    
    if (filters.category) {
      filterArray.push({ term: { category: filters.category } });
    }
    
    if (filters.priceRange) {
      filterArray.push({
        range: {
          price: {
            gte: filters.priceRange.min,
            lte: filters.priceRange.max
          }
        }
      });
    }
    
    return filterArray;
  }
 
  private buildAggregations() {
    return {
      categories: {
        terms: { field: 'category.keyword', size: 10 }
      },
      price_stats: {
        stats: { field: 'price' }
      }
    };
  }
}

3 ）集群健康监控

// cluster-monitor.service.ts
import { Injectable } from '@nestjs/common';
import { ElasticsearchService } from '@nestjs/elasticsearch';
 
@Injectable()
export class ClusterMonitorService {
  constructor(private readonly es: ElasticsearchService) {}
  
  async getClusterHealth() {
    return this.es.cluster.health();
  }
  
  async getNodeStats() {
    return this.es.nodes.stats({
      metric: [
        'os', 'jvm', 'fs', 'process', 'indices', 'thread_pool'
      ],
      human: true 
    });
  }
  
  async getIndexStats(indices: string[]) {
    return this.es.indices.stats({
      index: indices.join(','),
      metric: ['docs', 'store', 'indexing', 'search']
    });
  }
  
  async createIndexLifecyclePolicy() {
    return this.es.ilm.putLifecycle({
      policy: 'hot_warm_policy',
      body: {
        policy: {
          phases: {
            hot: {
              min_age: '0ms',
              actions: {
                rollover: {
                  max_size: '50gb',
                  max_age: '30d'
                },
                set_priority: {
                  priority: 100
                }
              }
            },
            warm: {
              min_age: '60d',
              actions: {
                set_priority: {
                  priority: 50
                },
                allocate: {
                  number_of_replicas: 1,
                  require: {
                    data_type: 'warm'
                  }
                }
              }
            }
          }
        }
      }
    });
  }
}

工程示例：3

三种全链路实现：写入优化、查询封装、监控告警

1 ) 批量写入与Translog异步化

import { Controller, Post } from '@nestjs/common';
import { Client } from '@elastic/elasticsearch';
import { BulkOperationContainer } from '@elastic/elasticsearch/lib/api/types';
 
@Controller('data')
export class DataIngestController {
  private readonly esClient = new Client({ node: 'http://localhost:9200' });
 
  @Post('bulk')
  async bulkInsert() {
    const operations: BulkOperationContainer[] = [];
    const dataset = [...]; // 10-20MB数据 
 
    dataset.forEach(doc => {
      operations.push({ index: { _index: 'logs' } });
      operations.push(doc);
    });
 
    await this.esClient.bulk({ body: operations });
    
    // 异步Translog配置
    await this.esClient.indices.putSettings({
      index: 'logs',
      body: {
        translog: { durability: 'async', sync_interval: '120s' }
      }
    });
  }
}

2 ) 查询优化与缓存机制

import { Controller, Get, Query } from '@nestjs/common';
import { Client } from '@elastic/elasticsearch';
 
@Controller('search')
export class SearchController {
  private readonly esClient = new Client({ node: 'http://localhost:9200' });
 
  @Get()
  async search(@Query('keyword') keyword: string) {
    const response = await this.esClient.search({
      index: 'products',
      body: {
        query: {
          bool: {
            filter: [  // Filter上下文优先
              { term: { in_stock: true } },
              { match: { name: keyword } }
            ]
          }
        },
        size: 10 
      }
    });
    return response.hits.hits;
  }
}

3 ) X-Pack监控集成与告警

import { Injectable } from '@nestjs/common';
import { Client } from '@elastic/elasticsearch';
 
@Injectable()
export class MonitoringService {
  private readonly esClient = new Client({ node: 'http://localhost:9200' });
 
  async checkClusterHealth() {
    const health = await this.esClient.cluster.health();
    if (health.status === 'red') {
      this.triggerAlert('集群故障！');
    }
  }
 
  async trackPerformance() {
    const stats = await this.esClient.nodes.stats();
    const heapUsed = stats.nodes[process.env.NODE_ID].jvm.mem.heap_used_percent;
    if (heapUsed > 85) {
      this.triggerAlert(`堆内存使用率 ${heapUsed}%！`);
    }
  }
 
  private triggerAlert(message: string) {
    // 集成短信/邮件告警（如Twilio、Nodemailer）
  }
}

配置补充：

Elasticsearch周边配置：
- 索引模板：自动应用分片策略
- ILM策略：自动滚动索引（日志场景）
NestJS最佳实践：
- 依赖注入ElasticsearchModule（需安装@nestjs/elasticsearch）
- 环境变量管理（ConfigModule）

工程示例：4

1 ) 方案1：原生客户端直连（高灵活度）

import { Module } from '@nestjs/common';  
import { ElasticsearchModule } from '@nestjs/elasticsearch';  
 
@Module({  
  imports: [  
    ElasticsearchModule.register({  
      node: 'http://192.168.1.10:9200',  
      maxRetries: 3,  
      requestTimeout: 30000,  
      sniffOnStart: true,  
    }),  
  ],  
})  
export class SearchModule {}  
 
// 服务层调用示例  
import { Injectable } from '@nestjs/common';  
import { ElasticsearchService } from '@nestjs/elasticsearch';  
 
@Injectable()  
export class SearchService {  
  constructor(private readonly esService: ElasticsearchService) {}  
 
  async indexDocument(index: string, body: any) {  
    return this.esService.index({ index, body });  
  }  
 
  async search(index: string, query: any) {  
    return this.esService.search({ index, body: { query } });  
  }  
}

2 ) 方案2：Alias切换（零停机重建索引）

// 步骤1：创建新索引  
await esService.indices.create({ index: 'logs-2023-v2' });  
 
// 步骤2：数据迁移（使用Reindex API）  
await esService.reindex({  
  body: {  
    source: { index: 'logs-2023' },  
    dest: { index: 'logs-2023-v2' }  
  },  
  wait_for_completion: true  
});  
 
// 步骤3：Alias切换  
await esService.indices.updateAliases({  
  body: {  
    actions: [  
      { remove: { index: 'logs-2023', alias: 'current-logs' } },  
      { add: { index: 'logs-2023-v2', alias: 'current-logs' } }  
    ]  
  }  
});

3 ) 方案3：动态模板（自动适配字段类型）

PUT /logs-dynamic  
{  
  "mappings": {  
    "dynamic_templates": [  
      {  
        "strings_as_keyword": {  
          "match_mapping_type": "string",  
          "mapping": {  
            "type": "keyword",  
            "ignore_above": 256  
          }  
        }  
      },  
      {  
        "numbers_as_scaled_float": {  
          "match_mapping_type": "long",  
          "mapping": {  
            "type": "scaled_float",  
            "scaling_factor": 100  
          }  
        }  
      }  
    ]  
  }  
}

ES周边配置最佳实践

冷热架构：
- 热节点（SSD）：高频读写索引（node.attr.temperature=hot）
- 冷节点（HDD）：归档数据（index.routing.allocation.require.temperature=cold）

生命周期管理（ILM）：

PUT _ilm/policy/logs_policy  
{  
  "policy": {  
    "phases": {  
      "hot": {  
        "actions": { "rollover": { "max_size": "50gb" } }  
      },  
      "delete": {  
        "min_age": "365d",  
        "actions": { "delete": {} }  
      }  
    }  
  }  
}

安全加固：
- 启用TLS传输加密：xpack.security.transport.ssl.enabled=true
- 角色权限控制（RBAC）：定义最小权限角色

集群监控与运维体系

监控指标分级体系

核心级指标（必须监控）：

节点存活状态
集群健康状态（Green/Yellow/Red）
JVM堆内存使用率
磁盘使用率
CPU负载

业务级指标：

索引延迟（Indexing Latency）
查询延迟（Search Latency）
拒绝请求数（Rejected Requests）
线程池队列大小

容量规划指标：

分片分布均衡度
Segment内存占用
磁盘空间增长率
文档增长率

自动化运维脚本

分片均衡检查：

#!/bin/bash 
 
获取分片分布不均衡率 
IMBALANCE=$(curl -s -XGET 'http://localhost:9200/_cat/allocation?h=disk.percent,node' | 
  awk '{total+=$1; count++} END {print total/count}' | 
  xargs -I {} echo "scale=2; {} * 0.2" | bc)
 
if (( $(echo "$IMBALANCE > 20" | bc -l) )); then
  # 触发分片重分配 
  curl -XPUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d'
  {
    "transient": {
      "cluster.routing.rebalance.enable": "all",
      "cluster.routing.allocation.balance.shard": 0.45,
      "cluster.routing.allocation.balance.index": 0.55
    }
  }'
fi

冷数据迁移：

将30天前的索引迁移到冷节点
curl -XPUT "http://localhost:9200/*-2023.*/_settings" -H 'Content-Type: application/json' -d'
{
  "index.routing.allocation.require.data_type": "cold"
}'