十:故障转移流程中的状态转换
当哨兵针对某个主节点进行故障转移时,该主节点的故障转移状态master->failover_state,要依次经历下面六个状态:
SENTINEL_FAILOVER_STATE_WAIT_START
SENTINEL_FAILOVER_STATE_SELECT_SLAVE
SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION
SENTINEL_FAILOVER_STATE_RECONF_SLAVES
SENTINEL_FAILOVER_STATE_UPDATE_CONFIG
在哨兵的“主函数”sentinelHandleRedisInstance中,通过sentinelFailoverStateMachine函数进行故障转移状态的转换。它的代码如下:
void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
redisAssert(ri->flags & SRI_MASTER);
if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;
switch(ri->failover_state) {
case SENTINEL_FAILOVER_STATE_WAIT_START:
sentinelFailoverWaitStart(ri);
break;
case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
sentinelFailoverSelectSlave(ri);
break;
case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
sentinelFailoverSendSlaveOfNoOne(ri);
break;
case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
sentinelFailoverWaitPromotion(ri);
break;
case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
sentinelFailoverReconfNextSlave(ri);
break;
}
}
下面分别讲解每个状态及其处理函数。
1:SENTINEL_FAILOVER_STATE_WAIT_START
上一章讲过,在哨兵的“主函数”sentinelHandleRedisInstance中,调用sentinelStartFailoverIfNeeded函数判断是否可以开始一次故障转移流程。当条件满足后,就会调用sentinelStartFailover函数,开始新一轮的故障转移流程。在该函数中,就会将该主节点的故障转移状态置为SENTINEL_FAILOVER_STATE_WAIT_START。
一旦哨兵开始一次故障转移流程时,该哨兵第一件事就是向其他所有哨兵发送”is-master-down-by-addr”命令进行拉票。然后就是调用sentinelFailoverWaitStart函数处理当前状态。
sentinelFailoverWaitStart函数的代码如下:
void sentinelFailoverWaitStart(sentinelRedisInstance *ri) {
char *leader;
int isleader;
/* Check if we are the leader for the failover epoch. */
leader = sentinelGetLeader(ri, ri->failover_epoch);
isleader = leader && strcasecmp(leader,server.runid) == 0;
sdsfree(leader);
/* If I'm not the leader, and it is not a forced failover via
* SENTINEL FAILOVER, then I can't continue with the failover. */
if (!isleader && !(ri->flags & SRI_FORCE_FAILOVER)) {
int election_timeout = SENTINEL_ELECTION_TIMEOUT;
/* The election timeout is the MIN between SENTINEL_ELECTION_TIMEOUT
* and the configured failover timeout. */
if (election_timeout > ri->failover_timeout)
election_timeout = ri->failover_timeout;
/* Abort the failover if I'm not the leader after some time. */
if (mstime() - ri->failover_start_time > election_timeout) {
sentinelEvent(REDIS_WARNING,"-failover-abort-not-elected",ri,"%@");
sentinelAbortFailover(ri);
}
return;
}
sentinelEvent(REDIS_WARNING,"+elected-leader",ri,"%@");
ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
ri->failover_state_change_time = mstime();
sentinelEvent(REDIS_WARNING,"+failover-state-select-slave",ri,"%@");
}
当前哨兵,在调用sentinelStartFailover函数发起故障转移流程时,会将当前选举纪元sentinel.current_epoch记录到ri->failover_epoch中。因此,本函数首先根据ri->failover_epoch,调用函数sentinelGetLeader得到本界选举的结果leader。如果本界选举尚无人获得超过半数的选票,则leader为NULL;
如果当前哨兵还没有赢得选举,并且主节点标志位中没有设置SRI_FORCE_FAILOVER标记,说明当前哨兵还没有获得足够的选票,暂时不能继续进行接下来的故障转移流程,需要直接返回。
但是如果超过一定时间之后,当前哨兵还是没有赢得选举,则会终止当前的故障转移流程,因此如果当前距离开始故障转移的时间超过election_timeout,则调用函数sentinelAbortFailover,终止本次故障转移流程。
如果当前哨兵最终赢得了选举,则更新故障转移的状态,置ri->failover_state属性为下一个状态:SENTINEL_FAILOVER_STATE_SELECT_SLAVE,并更新ri->failover_state_change为当前时间;
2:SENTINEL_FAILOVER_STATE_SELECT_SLAVE
当故障转移状态转换为SENTINEL_FAILOVER_STATE_SELECT_SLAVE时,就需要在下线主节点的所有下属从节点中,按照一定的规则,选择一个从节点使其成为新的主节点。
该状态下的处理函数为sentinelFailoverSelectSlave,该函数的代码如下:
void sentinelFailoverSelectSlave(sentinelRedisInstance *ri) {
sentinelRedisInstance *slave = sentinelSelectSlave(ri);
/* We don't handle the timeout in this state as the function aborts
* the failover or go forward in the next state. */
if (slave == NULL) {
sentinelEvent(REDIS_WARNING,"-failover-abort-no-good-slave",ri,"%@");
sentinelAbortFailover(ri);
} else {
sentinelEvent(REDIS_WARNING,"+selected-slave",slave,"%@");
slave->flags |= SRI_PROMOTED;
ri->promoted_slave = slave;
ri->failover_state = SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;
ri->failover_state_change_time = mstime();
sentinelEvent(REDIS_NOTICE,"+failover-state-send-slaveof-noone",
slave, "%@");
}
}
该函数首先调用函数sentinelSelectSlave选择一个符合条件的从节点;
如果没有合适的从节点,则调用sentinelAbortFailover直接终止本次故障转移流程;
如果找到了合适的从节点slave,则首先将标记SRI_PROMOTED增加到该从节点的标志位中;并使主节点实例的ri->promoted_slave指针指向该从节点实例,并将故障转移状态置为SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE;然后更新ri->failover_state_change_time为当前时间;
函数sentinelSelectSlave用于在下线主节点的所有从节点实例中,按照一定的规则选择一个从节点。该函数的代码如下:
sentinelRedisInstance *sentinelSelectSlave(sentinelRedisInstance *master) {
sentinelRedisInstance **instance =
zmalloc(sizeof(instance[0])*dictSize(master->slaves));
sentinelRedisInstance *selected = NULL;
int instances = 0;
dictIterator *di;
dictEntry *de;
mstime_t max_master_down_time = 0;
if (master->flags & SRI_S_DOWN)
max_master_down_time += mstime() - master->s_down_since_time;
max_master_down_time += master->down_after_period * 10;
di = dictGetIterator(master->slaves);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *slave = dictGetVal(de);
mstime_t info_validity_time;
if (slave->flags & (SRI_S_DOWN|SRI_O_DOWN|SRI_DISCONNECTED)) continue;
if (mstime() - slave->last_avail_time > SENTINEL_PING_PERIOD*5) continue;
if (slave->slave_prior