mysql select switch_【MySQL】【高可用】从masterha_master_switch工具简单分析MHA的切换逻辑...

本文介绍了MHA工具中的masterha_master_switch用于在线切换主从的逻辑。首先检查配置,然后选择合适的从库作为新主,关闭原主的非事务表,解锁并将其设为从库,而选中的从库变为新主。整个过程包括配置检查、拒绝更新、锁定表、切换从库等步骤,确保数据一致性。

简介:masterha_master_switch作为一个切换工具被集成在MHA程序包中,

安装:编译安装MHA manager后会在/usr/local/bin/中生成二进制可执行程序masterha_master_switch。

使用:

$masterha_master_switch --help

Usage:

# For master failover

masterha_master_switch --master_state=dead

--global_conf=/etc/masterha_default.cnf

--conf=/usr/local/masterha/conf/app1.cnf --dead_master_host=host1

# For online master switch

masterha_master_switch --master_state=alive

--global_conf=/etc/masterha_default.cnf

--conf=/usr/local/masterha/conf/app1.cnf

See online reference

(http://code.google.com/p/mysql-master-ha/wiki/masterha_master_switch)

for details.

在这里,我习惯将一套主从的配置都放在app1.cnf中,并且更改为业务相关的名称,如mainBusiness.cnf

分析:

目标:获取masterha_master_switch的在线切换逻辑

环境:MHA manager 192.168.1.8

MHA node1+MySQL5.7+GTID 192.168.1.109+PORT3109 主

MHA node1+MySQL5.7+GTID 192.168.1.110+PORT3110 从

配置文件内容:

manager_workdir=/data/mha/mainBusiness #设置MHA的工作目录

manager_log=/data/mha/mainBusiness/manager.log #MHA manager的日志输出

remote_workdir=/data/mha/ #预设MHA node端的工作目录

master_binlog_dir= /data/mysql/3109/log/,/data/mysql/3110/log/ #预设MHA node端的binlog目录

#secondary_check_script= masterha_secondary_check -s 192.168.1.109 -s 192.168.1.110

secondary_check_script= masterha_secondary_check -s 192.168.1.109 -s 192.168.1.110 --user=root --master_host=192.168.1.109 --master_port=3109

ping_interval=1 #设置MHA manager的检测间隔(1秒)

[server1]

hostname=MySQL-Cent7-IP001109

ip=192.168.1.109

port=3109

ssh_user=root

ssh_port=22

candidate_master=1 #设置该节点是否可以提升为主,1为是,0否

check_repl_delay=0 #发生故障后是否检查本实例主从落后程度,0否,1是

[server2]

hostname=MySQL-Cent7-IP001110

ip=192.168.1.110

port=3110

ssh_user=root

ssh_port=22

candidate_master=1 #设置该节点是否可以提升为主,1为是,0否

check_repl_delay=0 #发生故障后是否检查本实例主从落后程度,0否,1是

在MHA manager端上执行:

$masterha_master_switch --master_state=alive --conf=/etc/mha/mainBusiness.cnf --orig_master_is_new_slave

#--master_state 指明在线切换

#--orig_master_is_new_slave 指定原先的主作为从库挂到新的主上

MHA manager端输出如下

#####################输出段1###########################

[info] MHA::MasterRotate version 0.57.

[info] Starting online master switch.. #开始在线切换

[info]

[info] * Phase 1: Configuration Check Phase.. #阶段1,检查配置

[info]

[warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.. #我这里没有使用全局参数文件,会有报错跳过,没关系

[info] Reading application default configuration from /etc/mha/mainBusiness.cnf.. #程序从mainBusiness文件中读取配置

[info] Reading server configuration from /etc/mha/mainBusiness.cnf..

[info] GTID failover mode = 1 #启用GTID故障转移模式

[info] Current Alive Master: MySQL-Cent7-IP001109(192.168.1.109:3109) #列出当前存活的主实例

[info] Alive Slaves: #列出当前存活的从实例

[info] MySQL-Cent7-IP001110(192.168.1.110:3110) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled

[info] GTID ON #从采用了GTID模式

[info] Replicating from 192.168.1.109(192.168.1.109:3109)

[info] Primary candidate for the new Master (candidate_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on MySQL-Cent7-IP001109(192.168.1.109:3109)? (YES/no):

#处于事务一致性考虑,程序询问是否临时关闭非事务型的表

原master,3109端显示如下

#####################输出段1###########################

SET wait_timeout=86400; #设置连接超时时间,防止切换时翻车

SELECT @@global.server_id As Value;

SELECT VERSION() AS Value; #获取用于复制的server—id

SELECT @@global.gtid_mode As Value; #获取自身是否使用了GTID复制

SHOW GLOBAL VARIABLES LIKE 'log_bin'; #检查自身是否启用了binlog

SHOW MASTER STATUS; #获取自身的事务执行情况

SELECT @@global.datadir AS Value; #获取自身数据文件的存储位置

SELECT @@global.slave_parallel_workers AS Value; #检查是否采用了多线程复制

SHOW SLAVE STATUS; #获取自身作为从库时的事务执行情况

SELECT @@global.read_only As Value; #获取自身是否开启了只读

SELECT @@global.relay_log_purge As Value; #检查自身是否开启了relay log自动清除

原slave,3110端显示如下

SELECT @@global.server_id As Value;

SELECT VERSION() AS Value;

SELECT @@global.gtid_mode As Value;

SHOW GLOBAL VARIABLES LIKE 'log_bin';

SHOW MASTER STATUS;

SELECT @@global.datadir AS Value;

SELECT @@global.slave_parallel_workers AS Value;

SHOW SLAVE STATUS;

SELECT @@global.read_only As Value;

SELECT @@global.relay_log_purge As Value;

SELECT @@global.relay_log_info_repository AS Value; #差异处,获取自身relay信息保存形式(table)

SELECT Relay_log_name FROM mysql.slave_relay_log_info; #差异处,获取正在使用的relay文件名称

SELECT @@global.datadir AS Value;

SHOW SLAVE STATUS;

SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'; #差异处,检查复制用户是否具有复制权限

第一部分小结:

读取配置文件,确认主从关系与复制方式;

根据主从关系复制方式,连接主库:设置必要参数,获取的复制详细信息/

连接从库:获取复制的详细信息,获取relay信息,获取repl账号并确认权限

接MHA manager端输出确认输入yes后

MHA manager端输出如下:

#####################输出段2###########################

[info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

[info] ok. #注意,这里虽然显示关闭了非事务表,但是上面的抓包语句里面并没有出现相关语句

[info] Checking MHA is not monitoring or doing failover.. #检查MHA是否工作,切换时要求MHA manager停止运行

[info] Checking replication health on MySQL-Cent7-IP001110.. #检查主从健康程度

[info] ok.

[info] Searching new master from slaves.. #开始在从库中选取一个新主

[info] Candidate masters from the configuration file: #列出候选从库

[info] MySQL-Cent7-IP001109(192.168.1.109:3109) Version=5.7.19-log log-bin:enabled

[info] GTID ON #检查GTID开启情况

[info] MySQL-Cent7-IP001110(192.168.1.110:3110) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled

[info] GTID ON #检查GTID开启情况

[info] Replicating from 192.168.1.109(192.168.1.109:3109)

[info] Primary candidate for the new Master (candidate_master is set)

[info] Non-candidate masters:

[info] Searching from candidate_master slaves which have received the latest relay log events..

[info] #在所有从库中选取relay log最新的一个作为新的主库

From:

MySQL-Cent7-IP001109(192.168.1.109:3109) (current master)

+--MySQL-Cent7-IP001110(192.168.1.110:3110)

To:

MySQL-Cent7-IP001110(192.168.1.110:3110) (new master)

+--MySQL-Cent7-IP001109(192.168.1.109:3109)

Starting master switch from MySQL-Cent7-IP001109(192.168.1.109:3109) to MySQL-Cent7-IP001110(192.168.1.110:3110)? (yes/NO):

#程序询问是否可以进行切换了

master,3109端显示如下

#####################输出段2###########################

USE `unknown_database`;

FLUSH NO_WRITE_TO_BINLOG TABLES; -- 刷新表缓存到硬盘,同时不记录这条到binlog

SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value; -- 加一个模拟锁,防止出现切换程序多开的问题,这样多开的程序以为获取不到同名锁就会失败退出,

#GET_LOCK(str,time) -- str为锁的名称,0表示持续锁,5.7.5之后可以存在多个名称不同的GET_LOCK

SHOW PROCESSLIST;

原slave,3110端显示如下

#####################输出段2###########################

USE `unknown_database`;

SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value;

SHOW SLAVE STATUS;

SHOW SLAVE STATUS;

第二部分小结:

1.确认可以保存关闭表后,刷新表缓存到磁盘, 避免丢数据,加模拟锁防止切换程序多开造成切换异常

2.从候选从库中选出拥有最新数据的从库,并将其设为要切换到的主库

接MHA manager端输出确认可以切换后

MHA manager端输出如下:

#####################输出段3###########################

[info] Checking whether MySQL-Cent7-IP001110(192.168.1.110:3110) is ok for the new master..

[info] ok. #检查上一步中被选定的新主库的是否真正可以成为主

[info] MySQL-Cent7-IP001109(192.168.1.109:3109): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. #没有检查到原来主(现在被作为从)作为从库的残留信息,不管3721先将其挂到一个空主上

[info] MySQL-Cent7-IP001109(192.168.1.109:3109): Resetting slave pointing to the dummy host.

[info] ** Phase 1: Configuration Check Phase completed.

[info] # 配置检查阶段结束

[info] * Phase 2: Rejecting updates Phase..

[info] #

master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO):

#切换程序检测到主机虚拟IP地址切换的地址没有定义,如果只切换主从身份的话,应用还会写到原来的主上,需要设置只读。询问是否真的要切换,这里我们只是做切换实验,先观察下两个实例的输出,然后直接切换即可。

原master,3109端显示如下

USE `unknown_database`;

FLUSH NO_WRITE_TO_BINLOG TABLES; -- 刷新表缓存到磁盘, 避免同步丢数据的问题

SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value;

SHOW PROCESSLIST;

SHOW SLAVE STATUS;

CHANGE MASTER TO MASTER_HOST='dummy_host'; #将原主作为从切换到一个莫须有的主机上

SHOW SLAVE STATUS;

RESET SLAVE /*!50516 ALL */; #尝试清除自己之前残存的slave属性的信息,若本机不为主时,需要!

SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value;

原slave,3110端显示如下

USE `unknown_database`;

SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value;

SHOW SLAVE STATUS;

SHOW SLAVE STATUS;

SHOW PROCESSLIST;

第三部分总结:

1.检查选定的从库的详细信息,确认真的是否可以作为新的主库

2.处理从库之间的数据比对,若只有一个主和从,则将原主库先挂到一个莫须有的空主上

若有两个以上的从库,则需清理从库原来的slave记录

接MHA manager端输出确认,强制切换后:

MHA manager端输出如下:

#####################输出段4###########################

[info] Locking all tables on the orig master to reject updates from everybody (including root):

[info] Executing FLUSH TABLES WITH READ LOCK..

[info] ok. #所有的表都禁止写操作

[info] Orig master binlog:pos is 3109binlog.000070:536.

[info] Waiting to execute all relay logs on MySQL-Cent7-IP001110(192.168.1.110:3110)..

[info] master_pos_wait(3109binlog.000070:536) completed on MySQL-Cent7-IP001110(192.168.1.110:3110). Executed 0 events.

[info] done.#将从库已经获得,但还未来得及执行的事务应用到自身,和其他没有跟上的从库

[info] Getting new master`s binlog name and position..

[info] 3110binlog.000049:536

[info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='MySQL-Cent7-IP001110 or 192.168.1.110', MASTER_PORT=3110, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

[info] Setting read_only=0 on MySQL-Cent7-IP001110(192.168.1.110:3110)..

[info] ok. #即将所有其他从库都用change master的语句切换到新的主,即原来的从110上

[info]

[info] * Switching slaves in parallel..

[info] #即将并行切换所有从库

[info] Unlocking all tables on the orig master:

[info] Executing UNLOCK TABLES..

[info] ok. #释放原主库的表锁

[info] Starting orig master as a new slave..

[info] Resetting slave MySQL-Cent7-IP001109(192.168.1.109:3109) and starting replication from the new master MySQL-Cent7-IP001110(192.168.1.110:3110)..

[info] Executed CHANGE MASTER.

[info] Slave started.

[info] All new slave servers switched successfully.

[info]

[info] * Phase 5: New master cleanup phase..

[info] #即将清理新从库的主从信息,切换到主库

[info] MySQL-Cent7-IP001110: Resetting slave info succeeded.

[info] Switching master to MySQL-Cent7-IP001110(192.168.1.110:3110) completed successfully.

#切换完成

新slave,3109端显示如下:

#####################输出段4###########################

USE ``;

SELECT CONNECTION_ID() AS Value;

SET wait_timeout=86400;

SET GLOBAL read_only=1; #将自己设为只读

SHOW MASTER STATUS;

UNLOCK TABLES; #释放表锁

CHANGE MASTER TO MASTER_HOST='192.168.1.110', MASTER_PORT=3110, MASTER_USER='repl', MASTER_PASSWORD='123456', MASTER_AUTO_POSITION=1; #把自己指向MHA选定的新主

START SLAVE;

SHOW SLAVE STATUS; #差异处,启动slave身份

SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value; #释放锁止同应用的并发锁

新master,3110端显示如下:

#####################输出段4###########################

SHOW SLAVE STATUS;

SELECT MASTER_POS_WAIT('3109binlog.000070','536',0) AS Result; #检查自己是否执行完差异事务

STOP SLAVE SQL_THREAD; #停止从库SQL线程

SHOW SLAVE STATUS;

SHOW MASTER STATUS;

SELECT @@global.read_only As Value;

SELECT @@global.read_only As Value;

SET GLOBAL read_only=0; #取消只读状态

USE ``;

SELECT UNIX_TIMESTAMP();

SELECT @@GLOBAL.SERVER_ID;

SET @master_heartbeat_period= 30000001024;

SET @master_binlog_checksum= @@global.binlog_checksum;

SELECT @master_binlog_checksum;

SELECT @@GLOBAL.GTID_MODE;

SELECT @@GLOBAL.SERVER_UUID;

SET @slave_uuid= '28ea40ab-9bbd-11e7-8cd1-000c29c31069'; #写入从库的UUID

STOP SLAVE; #停止自身作为从库的身份

SHOW SLAVE STATUS;

RESET SLAVE /*!50516 ALL */; #清除自身作为从库的所有记录

SHOW SLAVE STATUS;

SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value; #释放锁止同应用的并发锁

第四部分总结:

1.检查选定的新主实例relay日志是否存在未执行完的原主事务,并应用到新主

原主库:锁止,记录主从信息,释放锁,将自己指向新的主,启动从库身份

新主库: 自己执行完未来得及执行完的事务后停止自己从库身份,取消只读,清除自身作为从的记录

最后释放切换程序的并发锁

全文总结:

1.读取配置文件,确认主从关系与复制方式,根据主从关系复制方式,连接主库:设置必要参数,获取的复制详细信息; 连接从库:获取复制的详细信息,获取relay信息,获取repl账号并确认权限。

2.确认可以保存关闭非事务表后,关闭非事务表,加模拟锁防止切换程序多开造成切换异常,从候选从库中选出拥有最新数据的从库,并将其设为要切换到的主库。

3.检查选定的从库的详细信息,确认真的是否可以作为新的主库,处理从库之间的数据比对,若只有一个主和从,则将原主库先挂到一个莫须有的空主上;若有两个以上的从库,则需清理从库原来的slave记录。

4.检查选定的新主实例relay日志是否存在未执行完的原主事务,并应用到新主,应用完后清理自己的从库信息,取消只读,转为主库;锁止原主库,记录主从信息,切换到新主库。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值