一、环境准备
准备三台机器:
192.168.122.134、192.168.122.135、192.168.122.136
配置哨兵模式前,请先配置主从,参考地址:https://blog.youkuaiyun.com/github_26672553/article/details/69568259
二、开始
配置master,192.168.122.134
拷贝sentinel.conf到指定的conf目录,我的是/usr/local/redis/conf
cp sentinel.conf /usr/local/redis/conf/
vi sentinel.conf
#添加下面配置
port 26379
logfile /usr/local/redis/logs/sentinel.log
dir /usr/local/redis/data/
sentinel monitor mymaster 192.168.122.134 6379 2
sentinel down-after-milliseconds master001 30000
sentinel parallel-syncs master001 1
sentinel failover-timeout master001 180000
配置文件说明:
1. port :当前Sentinel服务运行的端口
2. logfile : 日志文件
3.sentinel monitor mymaster 192.168.122.134 6379 2:Sentinel去监视一个名为mymaster的主redis实例,这个主实例的IP地址为本机地址192.168.122.134,端口号为6379,而将这个主实例判断为失效至少需要2个 Sentinel进程的同意,只要同意Sentinel的数量不达标,自动failover就不会执行
4.sentinel down-after-milliseconds master001 30000:指定了Sentinel认为Redis实例已经失效所需的毫秒数。当实例超过该时间没有返回PING,或者直接返回错误,那么Sentinel将这个实例标记为主观下线。只有一个 Sentinel进程将实例标记为主观下线并不一定会引起实例的自动故障迁移:只有在足够数量的Sentinel都将一个实例标记为主观下线之后,实例才会被标记为客观下线,这时自动故障迁移才会执行
5.sentinel parallel-syncs master001 1:指定了在执行故障转移时,最多可以有多少个从Redis实例在同步新的主实例,在从Redis实例较多的情况下这个数字越小,同步的时间越长,完成故障转移所需的时间就越长
6.sentinel failover-timeout master001 180000:如果在该时间(ms)内未能完成failover操作,则认为该failover失败
7.sentinel notification-script <master-name> <script-path>:指定sentinel检测到该监控的redis实例指向的实例异常时,调用的报警脚本。该配置项可选,但是很常用
三、配置从服务器
通过scp把192.168.122.134中的sentinel.conf传到192.168.122.135,192.168.122.136
scp sentinel.conf root@192.168.122.135:/usr/local/redis/conf
scp sentinel.conf root@192.168.122.136:/usr/local/redis/conf
四、启动
首先启动三台redis服务
/usr/local/redis/bin/redis-server /usr/local/redis/conf/6379.conf
#启动哨兵
/usr/local/redis/bin/redis-sentinel /usr/local/redis/conf/sentinel.conf
打开192.168.122.135的sentinel.log
1253:X 30 Oct 18:01:17.808 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1253:X 30 Oct 18:01:17.808 # Sentinel ID is 20078e7551cb39d9aed5345c5865f819f0d37f0b
1253:X 30 Oct 18:01:17.808 # +monitor master mymaster 192.168.122.134 6379 quorum 2
1253:X 30 Oct 18:01:17.809 * +slave slave 192.168.122.135:6379 192.168.122.135 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:01:17.811 * +slave slave 192.168.122.136:6379 192.168.122.136 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:29.819 # +sdown master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:30.876 # +odown master mymaster 192.168.122.134 6379 #quorum 2/2
现在192.168.122.134是master。135,136是从服务;
现在人工kill掉192.168.122.134的redis服务,再看log
1253:X 30 Oct 18:03:30.876 # +try-failover master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:30.877 # +vote-for-leader 20078e7551cb39d9aed5345c5865f819f0d37f0b 31
1253:X 30 Oct 18:03:30.881 # 9c567e460129490fac40b780be9b48735bfa2216 voted for 20078e7551cb39d9aed5345c5865f819f0d37f0b 31
1253:X 30 Oct 18:03:30.936 # +elected-leader master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:30.936 # +failover-state-select-slave master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.037 # +selected-slave slave 192.168.122.135:6379 192.168.122.135 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.037 * +failover-state-send-slaveof-noone slave 192.168.122.135:6379 192.168.122.135 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.138 * +failover-state-wait-promotion slave 192.168.122.135:6379 192.168.122.135 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.418 # +promoted-slave slave 192.168.122.135:6379 192.168.122.135 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.418 # +failover-state-reconf-slaves master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.468 * +slave-reconf-sent slave 192.168.122.136:6379 192.168.122.136 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:31.975 # -odown master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:32.437 * +slave-reconf-inprog slave 192.168.122.136:6379 192.168.122.136 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:32.437 * +slave-reconf-done slave 192.168.122.136:6379 192.168.122.136 6379 @ mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:32.493 # +failover-end master mymaster 192.168.122.134 6379
1253:X 30 Oct 18:03:32.493 # +switch-master mymaster 192.168.122.134 6379 192.168.122.135 6379
1253:X 30 Oct 18:03:32.493 * +slave slave 192.168.122.136:6379 192.168.122.136 6379 @ mymaster 192.168.122.135 6379
1253:X 30 Oct 18:03:32.493 * +slave slave 192.168.122.134:6379 192.168.122.134 6379 @ mymaster 192.168.122.135 6379
1253:X 30 Oct 18:04:02.527 # +sdown slave 192.168.122.134:6379 192.168.122.134 6379 @ mymaster 192.168.122.135 6379
1253:X 30 Oct 18:10:53.112 # -sdown slave 192.168.122.134:6379 192.168.122.134 6379 @ mymaster 192.168.122.135 6379
已经选举出了192.168.122.135作为master
再启动192.168.122.134后,master还是135,这样就是简单地实现了redis的HA
sentinel对于不可用有两种不同的看法,一个叫主观不可用(SDOWN),另外一个叫客观不可用(ODOWN)。
SDOWN是sentinel自己主观上检测到的关于master的状态,
ODOWN需要一定数量的sentinel达成一致意见才能认为一个master客观上已经宕掉,各个sentinel之间通过命令SENTINEL is_master_down_by_addr来获得其它sentinel对master的检测结果。
+switch-master <master name> <oldip> <oldport> <newip> <newport> :配置变更,主服务器的 IP 和地址已经改变。 这是绝大多数外部用户都关心的信息。
在Leader触发failover之前,首先wait数秒(随即0~5),以便让其他sentinel实例准备和调整。如果一切正常,那么leader就需要开始将一个salve提升为master,此slave必须为状态良好(不能处于SDOWN/ODOWN状态)且权重值最低(redis.conf中)的,当master身份被确认后,开始failover
A)“+failover-triggered”: Leader开始进行failover,此后紧跟着“+failover-state-wait-start”,wait数秒。
B)“+failover-state-select-slave”: Leader开始查找合适的slave
C)“+selected-slave”: 已经找到合适的slave
D) “+failover-state-sen-slaveof-noone”: Leader向slave发送“slaveof no one”指令,此时slave已经完成角色转换,此slave即为master
E) “+failover-state-wait-promotition”: 等待其他sentinel确认slave
F)“+promoted-slave”:确认成功
G)“+failover-state-reconf-slaves”: 开始对slaves进行reconfig操作。
H)“+slave-reconf-sent”:向指定的slave发送“slaveof”指令,告知此slave跟随新的master
I)“+slave-reconf-inprog”: 此slave正在执行slaveof + SYNC过程,如过slave收到“+slave-reconf-sent”之后将会执行slaveof操作。
J)“+slave-reconf-done”: 此slave同步完成,此后leader可以继续下一个slave的reconfig操作。循环G)
K)“+failover-end”: 故障转移结束
L)“+switch-master”:故障转移成功后,各个sentinel实例开始监控新的master。
详细的参数解释:https://blog.youkuaiyun.com/wang258533488/article/details/79352378