一、背景
今天讲下工作中使用较多的MHA
首先需要确认一点的是,MHA实现的是Master的高可用。
贴一张MHA作者在 MySQL Conference and Expo 2011 上分享的图片
二、复制结构与基础配置
1、MySQL复制关系
Manager Node 192.168.237.11(管理节点)
Data Node_1 192.168.237.12(数据节点,MySQL主库)
Data Node_2 192.168.237.13(数据节点,MySQL备库,候选主库)
Data Node_3 192.168.237.14(数据节点,MySQL备库)
2、节点基础配置
(1)配置数据节点的iptables
配置数据节点的iptables,确保管理节点能访问到数据节点的MySQL端口
(2)打通SSH互信
正确配置管理节点和数据节点的域名解析,很重要!
管理节点及数据节点间打通SSH互信:
# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.11
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.12
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.13
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.14
(3)MySQL基础配置
所有数据节点配置相同的复制过滤规则(MHA在启动的时候会检查过滤规则,不同则启动失败)
开启候选主库的log-bin。管理节点会检测候选主库是否有配置log-bin,若没有该配置项,MHA将不会进行failover
三、MHA安装
1、安装包下载
RHEL7:
https://github.com/yoshinorim/mha4mysql-manager/releases
https://github.com/yoshinorim/mha4mysql-node/releases
RHEL6:
https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
2、配置CPAN源及EPEL源(可选)
(1)配置CPAN源(阿里源为例)
cpan[1]> o conf urllist push http://mirrors.aliyun.com/CPAN/
Please use 'o conf commit' to make the config permanent!
cpan[2]> o conf commit
commit: wrote '/usr/share/perl5/CPAN/Config.pm'
cpan[3]> o conf urllist
urllist
0 [http://mirrors.aliyun.com/CPAN/]
Type 'o conf' to view all configuration items
(2)配置EPEL源
RHEL6:
# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
RHEL7:
# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# yum clean all
# yum makecache
3、数据节点部署
(1)更新libmysqlclient.so
# rpm -qa |grep mysql-libs
mysql-libs-5.1.73-8.el6_8.x86_64
# rpm -e --nodeps mysql-libs-5.1.73-8.el6_8.x86_64
找一台MySQL拷贝libmysqlclient.so运行库文件
# cp /usr/local/mysql3306/lib/libmysqlclient.so.20 /usr/lib64/
# cp /usr/local/mysql3306/lib/libmysqlclient.so /usr/lib64/
(2)更新DBD-MySQL
# rpm -qa |grep perl-DBD-MySQL
perl-DBD-MySQL-4.013-3.el6.x86_64
# rpm -e --nodeps perl-DBD-MySQL-4.013-3.el6.x86_64
# yum install perl-CPAN
# ln -s /usr/local/mysql3306/bin/mysql_config /usr/local/bin/
cpan[1]> install DBD-MySQL
cpan[2]> install DBI
cpan[2]> install ExtUtils MakeMaker
(3)安装mha4mysql-node
# tar zxvf mha4mysql-node-0.56.tar.gz
# cd mha4mysql-node-0.56
# perl Makefile.PL
# make && make install
Tips:常见安装报错(一般是由于yum方式安装perl模块较旧导致)
Tip1、编译时出现Can't locate ExtUtils/MakeMaker.pm in @INC
https://github.com/Perl-Toolchain-Gang/ExtUtils-MakeMaker/releases
# tar zxvf ExtUtils-MakeMaker-7.18.tar.gz
# cd ExtUtils-MakeMaker-7.18
# perl Makefile.PL
# make && make install
Tip2、编译时出现Can't locate CPAN.pm in @INC
https://cpan.metacpan.org/authors/id/A/AN/ANDK/CPAN-2.26.tar.gz
# tar zxvf CPAN-2.26.tar.gz
# cd CPAN-2.26
# perl Makefile.PL
# make && make install
(2)设置relay log清除方式
mysql > set global relay_log_purge=0;
my.cnf中设置 relay_log_purge=0
关闭该参数的原因是因为在默认情况下,从库上的relay log在SQL线程执行完后会自动被删掉。但是在failover过程中从库需要利用候选主库上的中继日志来补数据,所以MySQL默认打开的自动清除中继日志的功能需要关闭。
但是为了不撑爆磁盘就需要定期清除旧的relay log。清除relay log需要考虑复制延时的问题。在ext3文件系统下,删除大文件建议采用硬链接的方法。
关于relay log的清除,可参考以下文章:
http://daisywei.blog.51cto.com/7837970/1881154
4、管理节点部署
(1)同数据节点一样,也需要安装 mha4mysql-node
(2)安装依赖包
# yum install gcc
cpan[1]> install Config::Tiny
cpan[2]> install Log::Dispatch
cpan[3]> install Parallel::ForkManager
cpan[4]> install Time::HiRes
(3)安装mha4mysql-manager
# tar zxvf mha4mysql-manager-0.56.tar.gz
# cd mha4mysql-manager-0.56
# perl Makefile.PL
# make && make install
(4)修改配置文件
建立mha配置文件及脚本目录
# mkdir -p /etc/masterha/
复制源码包内的配置文件、脚本(app1.cnf、masterha_default.cnf 可以合并为一个文件)
# cp ./mha4mysql-manager-0.56/samples/conf/masterha_default.cnf /etc/
# cp ./mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/materha/
# cp ./mha4mysql-manager-0.56/samples/scripts/* /etc/masterha/
# vim /etc/masterha_default.cnf
[server default]
#数据库账户(需要select、insert、super、reload权限,以及对mysql.user的更改权限)
user=mha
password=123456
#SSH账户
ssh_user=root
#复制账户
repl_user=replication_user
repl_password=123456
ping_interval=1
# master_ip_failover_script= /etc/masterha/master_ip_failover
# master_ip_online_change_script= /etc/masterha/master_ip_online_change
# shutdown_script= /etc/masterha/power_manager
# report_script= /etc/masterha/send_report
# vim /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
remote_workdir=/var/log/
[server1]
hostname=237_12
master_binlog_dir=/data/mysql3306
candidate_master=1
check_repl_delay=0 #防止master故障时,切换时slave有延迟,卡在那里切不过去
[server2]
hostname=237_13
master_binlog_dir=/data/mysql3306
candidate_master=1
check_repl_delay=0
[server3]
hostname=237_14
master_binlog_dir=/data/mysql3306
no_master=1
(5)环境检测
step1
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
提示:All SSH connection tests passed successfully 表示节点间SSH连同
step2
# masterha_check_repl --conf=/etc/masterha/app1.cnf
提示:MySQL Replication Health is OK.
(6)启动MHA Manager
# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
观察日志情况
# tail -f /tmp/mha_manager.log
Thu Aug 10 21:49:09 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Aug 10 21:49:09 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Aug 10 21:49:09 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
查看MHA状态:
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1716) is running(0:PING_OK), master:237_12
四、故障演练
1、failover故障切换
例:模拟主库(192.168.237.12)MySQL实例挂掉
观察备库237.14的主从复制关系:主库已经切换为237.13
观察此时新的主库的binlog位置点:
查看管理节点的日志记录(位置由配置文件中参数:manager_log决定)
# cat /var/log/masterha/app1/manager.log
From:
237_12(192.168.237.12:3306) (current master)
+--237_13(192.168.237.13:3306)
+--237_14(192.168.237.14:3306)
To:
237_13(192.168.237.13:3306) (new master)
+--237_14(192.168.237.14:3306)
[root@237_11 script]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
需要注意的是,无论宕机导致的master切换、还是没有特别设置地手动切换master(不含orig_master_is_new_slave选项),原来的master都不在MHA架构内了,即使重新启动也不会加入,必须手动加入。配置文件中仍然是以挂掉的master为主库,并没有提升candidated master为主库,此时MHA将不再具备高可用的特性,需尽快更新配置文件,并在管理节点运行masterha_check_ssh确认无误后重新启用MHA manager。
此时,若假设原主库(237.12)上的MySQL已成功修复故障,正常启动后,可以以备库的身份加入MHA架构内:
(1)复制采用GTID方式
直接change master ... MASTER_AUTO_POSITION=1;
(2)binlog+position方式
首先,在MHA管理节点的日志中找到,复制需要的主库binlog位置点:
建立到新主库237.13的复制关系
Tips:该步骤很有可能会报1236的错误,并且Slave_IO_Running状态为 No
解决办法如下(从库切换到新的binlog,并重新指向之前卡在的主库binlog位置点):
mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> flush logs;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> show master status \G
*************************** 1. row ***************************
File: mysql-bin.000035
Position: 107
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
mysql> change master to master_log_file='mysql-bin.000018',master_log_pos=107;
Query OK, 0 rows affected (0.01 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.237.13
Master_User: replication_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000018
Read_Master_Log_Pos: 107
Relay_Log_File: 237_12-relay-bin.000002
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql-bin.000018
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 107
Relay_Log_Space: 410
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 23713
1 row in set (0.00 sec)
当建立新主从关系后,可以开始修改MHA配置文件(app1.cnf)中的server1、server2的hostname,重新启用新的高可用。启动MHA前需做如下检查:
step1、删除前一次failover标记文件:
# rm -rf app1.failover.complete
step2、检查配置文件正确性:
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
step3、再次启动MHA:
# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
确认mha状态:
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1977) is running(0:PING_OK), master:node2
2、在线手动切换
(online master switch理论写操作的阻塞时间在0.5秒-2秒)
功能:用于调整无法在线变更的参数(例如innodb_buffer_pool_size等)。通过在从库配置文件中调整参数并重启生效后,执行online master switch将从库提升为主库,原主库同样修改参数后重启,并以从库的身份重新加入MHA架构,从而曲线实现了在线变更参数的需求而无需中断业务。
切换前需要先停止MHA manger,否则会有如下报错。
以下为成功切换过程:
[root@237_11 app1]# masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf < /dev/null > /tmp/mha_manager.log 2>&1
[root@237_11 app1]#
[root@237_11 app1]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=237_12 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Sun Aug 13 21:46:10 2017 - [info] MHA::MasterRotate version 0.56.
Sun Aug 13 21:46:10 2017 - [info] Starting online master switch..
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:10 2017 - [info] * Phase 1: Configuration Check Phase..
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:10 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Aug 13 21:46:10 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Aug 13 21:46:10 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Aug 13 21:46:10 2017 - [info] GTID failover mode = 0
Sun Aug 13 21:46:10 2017 - [info] Current Alive Master: 237_13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Alive Slaves:
Sun Aug 13 21:46:10 2017 - [info] 237_12(192.168.237.12:3306) Version=5.5.54-log (oldest major version between slaves) log-bin:enabled
Sun Aug 13 21:46:10 2017 - [info] Replicating from 192.168.237.13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Aug 13 21:46:10 2017 - [info] 237_14(192.168.237.14:3306) Version=5.5.54-log (oldest major version between slaves) log-bin:enabled
Sun Aug 13 21:46:10 2017 - [info] Replicating from 192.168.237.13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Not candidate for the new Master (no_master is set)
Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Aug 13 21:46:10 2017 - [info] ok.
Sun Aug 13 21:46:10 2017 - [info] Checking MHA is not monitoring or doing failover..
Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_12..
Sun Aug 13 21:46:10 2017 - [info] ok.
Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_14..
Sun Aug 13 21:46:10 2017 - [info] ok.
Sun Aug 13 21:46:10 2017 - [info] 237_12 can be new master.
Sun Aug 13 21:46:10 2017 - [info]
From:
237_13(192.168.237.13:3306) (current master)
+--237_12(192.168.237.12:3306)
+--237_14(192.168.237.14:3306)
To:
237_12(192.168.237.12:3306) (new master)
+--237_14(192.168.237.14:3306)
+--237_13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Checking whether 237_12(192.168.237.12:3306) is ok for the new master..
Sun Aug 13 21:46:10 2017 - [info] ok.
Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): Resetting slave pointing to the dummy host.
Sun Aug 13 21:46:10 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:10 2017 - [info] * Phase 2: Rejecting updates Phase..
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:10 2017 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Sun Aug 13 21:46:10 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Aug 13 21:46:10 2017 - [info] ok.
Sun Aug 13 21:46:10 2017 - [info] Orig master binlog:pos is mysql-bin.000018:107.
Sun Aug 13 21:46:10 2017 - [info] Waiting to execute all relay logs on 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:10 2017 - [info] master_pos_wait(mysql-bin.000018:107) completed on 237_12(192.168.237.12:3306). Executed 0 events.
Sun Aug 13 21:46:10 2017 - [info] done.
Sun Aug 13 21:46:10 2017 - [info] Getting new master's binlog name and position..
Sun Aug 13 21:46:10 2017 - [info] mysql-bin.000035:107
Sun Aug 13 21:46:10 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='237_12 or 192.168.237.12', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000035', MASTER_LOG_POS=107, MASTER_USER='replication_user', MASTER_PASSWORD='xxx';
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:10 2017 - [info] * Switching slaves in parallel..
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:10 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) started, pid: 8397
Sun Aug 13 21:46:10 2017 - [info]
Sun Aug 13 21:46:11 2017 - [info] Log messages from 237_14 ...
Sun Aug 13 21:46:11 2017 - [info]
Sun Aug 13 21:46:10 2017 - [info] Waiting to execute all relay logs on 237_14(192.168.237.14:3306)..
Sun Aug 13 21:46:10 2017 - [info] master_pos_wait(mysql-bin.000018:107) completed on 237_14(192.168.237.14:3306). Executed 0 events.
Sun Aug 13 21:46:10 2017 - [info] done.
Sun Aug 13 21:46:10 2017 - [info] Resetting slave 237_14(192.168.237.14:3306) and starting replication from the new master 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:11 2017 - [info] Executed CHANGE MASTER.
Sun Aug 13 21:46:11 2017 - [info] Slave started.
Sun Aug 13 21:46:11 2017 - [info] End of log messages from 237_14 ...
Sun Aug 13 21:46:11 2017 - [info]
Sun Aug 13 21:46:11 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) succeeded.
Sun Aug 13 21:46:11 2017 - [info] Unlocking all tables on the orig master:
Sun Aug 13 21:46:11 2017 - [info] Executing UNLOCK TABLES..
Sun Aug 13 21:46:11 2017 - [info] ok.
Sun Aug 13 21:46:11 2017 - [info] Starting orig master as a new slave..
Sun Aug 13 21:46:11 2017 - [info] Resetting slave 237_13(192.168.237.13:3306) and starting replication from the new master 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:11 2017 - [info] Executed CHANGE MASTER.
Sun Aug 13 21:46:11 2017 - [info] Slave started.
Sun Aug 13 21:46:11 2017 - [info] All new slave servers switched successfully.
Sun Aug 13 21:46:11 2017 - [info]
Sun Aug 13 21:46:11 2017 - [info] * Phase 5: New master cleanup phase..
Sun Aug 13 21:46:11 2017 - [info]
Sun Aug 13 21:46:11 2017 - [info] 237_12: Resetting slave info succeeded.
Sun Aug 13 21:46:11 2017 - [info] Switching master to 237_12(192.168.237.12:3306) completed successfully.
五、数据库无感访问
1、VIP方式
通过修改脚本:master_ip_failover实现,VIP:192.168.237.50
前提要求:
(1)相应设备不做mac地址绑定;
(2)首次bond vip,需要手工在主库上执行(后续漂移利用master_ip_failover实现)
init_vip.sh脚本如下
#!/bin/bash
key="1"
vip="192.168.47.50/24"
/sbin/ifconfig eth0:$key $vip
(3)candidate_master获取到VIP后, 需要如下执行:
# arping -I eth0 -c 3 -s $VIP $GATEWAY >/dev/null
作用是刷新ARP映射表,避免了VIP实际已漂移到从库的情况下,由于缓存的问题导致应用仍然无法访问的情况(已合并到下述脚本中执行)
缺陷:适用于单机单实例
master_ip_failover脚本如下:
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
#自定义该组机器的vip
my $vip = "192.168.237.50";
my $gateway = "192.168.237.2";
my $key = "1";
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
exit &main();
sub main {
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
# 此处也可以添加dns、zookeeper、consul等剔除dead master信息
my $exit_code = 1;
eval {
# updating global catalog, etc
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print "Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
## Creating an app user on the new master
print "Creating app user on the new master..\n";
#FIXME_xxx_create_user( $new_master_handler->{dbh} );
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();
## Update master ip on the catalog database, etc
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
# If you want to continue failover, exit 10.
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
#ssh -o ConnectTimeout=15 -o ConnectionAttempts=3,设置SSH等待超时时间15S,重试3次。防止SSH连不上,MHA超长等待。
#arping -I eth0 -c 3 -s $VIP $GATEWAY > /dev/null,刷新arp映射表
sub start_vip() {
`ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
`ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" arping -I eth0 -c 3 -s $vip $gateway > /dev/null \"`;
}
sub stop_vip() {
`ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
2、Consul
利用服务发现替代,适用于单机多实例
3、ProxySQL
利用ProxySQL进行路由规则的切换,详见我的另外一篇博文的第九节:
六、MHA存活监控
manager本身是一个单点,存在风险。
1、利用masterha_master_monitor,不再赘述
send_report #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成
2、masterha_check_status --conf=/etc/masterha/app1.cnf
参考文档
https://severalnines.com/blog/mysql-replication-failover-maxscale-vs-mha-part-1
MHA参数详解:http://wubx.net/mha-parameters/
MHA在线切换过程:https://www.bbsmax.com/A/amd0ml0Xzg/
MHA 清理relay log(purge_relay_logs):https://blog.youkuaiyun.com/leshami/article/details/45688503
安装报错排查:https://blog.51cto.com/arthur376/1812640
虚拟IP(VIP)在高可用上的原理:https://gitos.org/2019/08/19/vip-ha.html