最近ES里面有个索引一直保存,错误信息如下:
[2019-04-03T09:54:15,328][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[gps_lte-mode-2019.04.03][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[gps_lte-mode-2019.04.03][1]] containing [8] requests]"})
[2019-04-03T09:54:15,328][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>25}
解决问题过程和方式:
一、查看集群健康状态:
发现健康状态是red
[root@netmgmt-prod-elk-03 ~]# curl '10.7.1.8:9200/_cluster/health?pretty'
{
"cluster_name" : "es-e679l179",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5831,
"active_shards" : 11422,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 250,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 97.8581220013708
}
二、查看异常的index
发现有个索引异常了,直接进行删除即可。
[root@netmgmt-prod-elk-03 ~]# curl http://'10.7.1.8:9200/_cat/indices' | grep red
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
1 144k 1 2704 0 0 907 0 0:02:43 0:00:02 0:02:41 907
red open gps_lte-mode-2019.04.03 _N2IkwVeSxiP4s1gMyFQgw 5 1
删除命名:
curl -XDELETE 'http://10.7.1.8:9200/gps_lte-mode-2019.04.03'