redis cluster info显示cluster_state:fail解决方案

本文详细记录了一次Redis集群故障的排查过程,包括错误信息分析及解决方案。故障表现为集群状态失败,部分槽未被分配,导致服务不可用。通过分析cluster info和cluster slots命令输出,定位了缺失的槽,并使用cluster addslots命令成功修复了问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、查看错误信息:
1.1 错误信息(1)

127.0.0.1:7000> get name
(error) CLUSTERDOWN The cluster is down
127.0.0.1:7000> cluster info
cluster_state:fail
cluster_slots_assigned:16380
cluster_slots_ok:16380
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:8
cluster_my_epoch:1
cluster_stats_messages_sent:1007
cluster_stats_messages_received:1005
1.2 错误信息(2)

127.0.0.1:7000> cluster slots
1) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 7000
      3) "09d1ad8d8aa8acd0ca2b95206c58901e47318ec9"
   4) 1) "127.0.0.1"
      2) (integer) 7004
      3) "3b6910a3cf76025564d9f744f64ffa3a3b35fbc8"
2) 1) (integer) 10923
   2) (integer) 11991
   3) 1) "127.0.0.1"
      2) (integer) 7002
      3) "9b7110678b6eef4ae80c330eb0cfb51ffbc216ea"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "7d0da6cebfa834177a189b9f71b048e8aeb29c49"
3) 1) (integer) 11993
   2) (integer) 12381
   3) 1) "127.0.0.1"
      2) (integer) 7002
      3) "9b7110678b6eef4ae80c330eb0cfb51ffbc216ea"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "7d0da6cebfa834177a189b9f71b048e8aeb29c49"
4) 1) (integer) 12383
   2) (integer) 14040
   3) 1) "127.0.0.1"
      2) (integer) 7002
      3) "9b7110678b6eef4ae80c330eb0cfb51ffbc216ea"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "7d0da6cebfa834177a189b9f71b048e8aeb29c49"
5) 1) (integer) 14042
   2) (integer) 14385
   3) 1) "127.0.0.1"
      2) (integer) 7002
      3) "9b7110678b6eef4ae80c330eb0cfb51ffbc216ea"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "7d0da6cebfa834177a189b9f71b048e8aeb29c49"
6) 1) (integer) 14387
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 7002
      3) "9b7110678b6eef4ae80c330eb0cfb51ffbc216ea"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "7d0da6cebfa834177a189b9f71b048e8aeb29c49"
7) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 7001
      3) "caa158fcb538991c73438ca9801ab6ab2510e85a"
   4) 1) "127.0.0.1"
      2) (integer) 7005
      3) "0fbf5cbddefe6ad2324a225d25447ff80b033b27"
2.分析错误信息
 2.1 从错误信息(1)中cluster_slots_assigned:16380看出少了4个slot,因为集群就是要满足所有的16364个槽点全部分配才会成功。
2.2 统计错误信息(2)

1)     0-5460     7000、7004
2)10923-11991     7002、7003
3)11993-12381     7002、7003
4)12383-14040     7002、7003
5)14042-14385     7002、7003
6)14387-16383     7002、7003
7) 5461-10922     7001、7005
找到缺少的slot分别为11992、12382、14041、14386

3.解决方法:
 3.1将一个或多个槽(slot)指派(assign)给当前节点

cluster addslots 11992 12382 14041 14386
3.2 显示结果:

127.0.0.1:7000> cluster addslots 11992 12382 14041 14386
OK
127.0.0.1:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:8
cluster_my_epoch:1
cluster_stats_messages_sent:42312
cluster_stats_messages_received:42312

 

请根据下述 Prometheus 采集得到的 redis 监控指标,给出 redis 集群建议的监控告警指标,并附上对应的告警表达式 redis_active_defrag_running、redis_allocator_active_bytesredis_allocator_allocated_bytes redis_allocator_frag_bytes redis_allocator_frag_ratio redis_allocator_resident_bytes redis_allocator_rss_bytes redis_allocator_rss_ratio redis_aof_base_size_bytes redis_aof_buffer_length redis_aof_current_rewrite_duration_sec redis_aof_current_size_bytes redis_aof_delayed_fsync redis_aof_enabled redis_aof_last_bgrewrite_status redis_aof_last_cow_size_bytes redis_aof_last_rewrite_duration_sec redis_aof_last_write_status redis_aof_pending_bio_fsync redis_aof_pending_rewrite redis_aof_rewrite_buffer_length redis_aof_rewrite_in_progress redis_aof_rewrite_scheduled redis_blocked_clients redis_client_recent_max_input_buffer_bytes redis_client_recent_max_output_buffer_bytes redis_cluster_connections redis_cluster_current_epoch redis_cluster_enabled redis_cluster_known_nodes redis_cluster_messages_received_total redis_cluster_messages_sent_total redis_cluster_my_epoch redis_cluster_size redis_cluster_slots_assigned redis_cluster_slots_fail redis_cluster_slots_ok redis_cluster_slots_pfail redis_cluster_state redis_cluster_stats_messages_auth_ack_sent redis_cluster_stats_messages_auth_req_received redis_cluster_stats_messages_fail_received redis_cluster_stats_messages_fail_sent redis_cluster_stats_messages_ping_received redis_cluster_stats_messages_ping_sent redis_cluster_stats_messages_pong_received redis_cluster_stats_messages_pong_sent redis_cluster_stats_messages_update_received redis_cluster_stats_messages_update_sent redis_commands_duration_seconds_total redis_commands_processed_total redis_commands_total redis_config_io_threads redis_config_maxclients redis_config_maxmemory redis_connected_clients redis_connected_slave_lag_seconds redis_connected_slave_offset_bytes redis_connected_slaves redis_connections_received_total redis_cpu_sys_children_seconds_total redis_cpu_sys_seconds_total redis_cpu_user_children_seconds_
03-16
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值