MySQL Master高可用---MHA的安装与配置说明

最新推荐文章于 2024-10-20 22:41:46 发布

亮总说数

最新推荐文章于 2024-10-20 22:41:46 发布

阅读量950

点赞数

分类专栏： MHA 文章标签： mysql

本文链接：https://blog.youkuaiyun.com/leonpenn/article/details/76805024

版权

MHA 专栏收录该内容

1 篇文章

订阅专栏

一、背景

今天讲下工作中使用较多的MHA

首先需要确认一点的是，MHA实现的是Master的高可用。

贴一张MHA作者在 MySQL Conference and Expo 2011 上分享的图片

二、复制结构与基础配置

1、MySQL复制关系

Manager Node 192.168.237.11（管理节点）

Data Node_1 192.168.237.12（数据节点，MySQL主库）

Data Node_2 192.168.237.13（数据节点，MySQL备库，候选主库）

Data Node_3 192.168.237.14（数据节点，MySQL备库）

2、节点基础配置

（1）配置数据节点的iptables

配置数据节点的iptables，确保管理节点能访问到数据节点的MySQL端口

（2）打通SSH互信

正确配置管理节点和数据节点的域名解析，很重要！

管理节点及数据节点间打通SSH互信：

# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.11
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.12
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.13
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.14

（3）MySQL基础配置

所有数据节点配置相同的复制过滤规则（MHA在启动的时候会检查过滤规则，不同则启动失败）

开启候选主库的log-bin。管理节点会检测候选主库是否有配置log-bin，若没有该配置项，MHA将不会进行failover

三、MHA安装

1、安装包下载

RHEL7：

https://github.com/yoshinorim/mha4mysql-manager/releases

https://github.com/yoshinorim/mha4mysql-node/releases

RHEL6：

https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

2、配置CPAN源及EPEL源（可选）

（1）配置CPAN源（阿里源为例）

cpan[1]> o conf urllist push http://mirrors.aliyun.com/CPAN/
Please use 'o conf commit' to make the config permanent!
cpan[2]> o conf commit
commit: wrote '/usr/share/perl5/CPAN/Config.pm'
cpan[3]> o conf urllist
urllist
0 [http://mirrors.aliyun.com/CPAN/]
Type 'o conf' to view all configuration items

（2）配置EPEL源

RHEL6：

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm

RHEL7：

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# yum clean all

# yum makecache

3、数据节点部署

（1）更新libmysqlclient.so

# rpm -qa |grep mysql-libs
mysql-libs-5.1.73-8.el6_8.x86_64
# rpm -e --nodeps mysql-libs-5.1.73-8.el6_8.x86_64

找一台MySQL拷贝libmysqlclient.so运行库文件
# cp /usr/local/mysql3306/lib/libmysqlclient.so.20 /usr/lib64/
# cp /usr/local/mysql3306/lib/libmysqlclient.so /usr/lib64/

（2）更新DBD-MySQL

# rpm -qa |grep perl-DBD-MySQL
perl-DBD-MySQL-4.013-3.el6.x86_64
# rpm -e --nodeps perl-DBD-MySQL-4.013-3.el6.x86_64

# yum install perl-CPAN
# ln -s /usr/local/mysql3306/bin/mysql_config /usr/local/bin/
cpan[1]> install DBD-MySQL

cpan[2]> install DBI

cpan[2]> install ExtUtils MakeMaker

（3）安装mha4mysql-node

# tar zxvf mha4mysql-node-0.56.tar.gz
# cd mha4mysql-node-0.56
# perl Makefile.PL
# make && make install

Tips：常见安装报错（一般是由于yum方式安装perl模块较旧导致）

Tip1、编译时出现Can't locate ExtUtils/MakeMaker.pm in @INC

https://github.com/Perl-Toolchain-Gang/ExtUtils-MakeMaker/releases
# tar zxvf ExtUtils-MakeMaker-7.18.tar.gz
# cd ExtUtils-MakeMaker-7.18
# perl Makefile.PL

# make && make install

Tip2、编译时出现Can't locate CPAN.pm in @INC

https://cpan.metacpan.org/authors/id/A/AN/ANDK/CPAN-2.26.tar.gz
# tar zxvf CPAN-2.26.tar.gz
# cd CPAN-2.26
# perl Makefile.PL

# make && make install

（2）设置relay log清除方式
mysql > set global relay_log_purge=0;
my.cnf中设置 relay_log_purge=0
关闭该参数的原因是因为在默认情况下，从库上的relay log在SQL线程执行完后会自动被删掉。但是在failover过程中从库需要利用候选主库上的中继日志来补数据，所以MySQL默认打开的自动清除中继日志的功能需要关闭。

但是为了不撑爆磁盘就需要定期清除旧的relay log。清除relay log需要考虑复制延时的问题。在ext3文件系统下，删除大文件建议采用硬链接的方法。
关于relay log的清除，可参考以下文章：

http://daisywei.blog.51cto.com/7837970/1881154

4、管理节点部署

（1）同数据节点一样，也需要安装 mha4mysql-node

（2）安装依赖包
# yum install gcc

cpan[1]> install Config::Tiny
cpan[2]> install Log::Dispatch
cpan[3]> install Parallel::ForkManager
cpan[4]> install Time::HiRes

（3）安装mha4mysql-manager
# tar zxvf mha4mysql-manager-0.56.tar.gz
# cd mha4mysql-manager-0.56
# perl Makefile.PL
# make && make install
（4）修改配置文件

建立mha配置文件及脚本目录

# mkdir -p /etc/masterha/

复制源码包内的配置文件、脚本（app1.cnf、masterha_default.cnf 可以合并为一个文件）

# cp ./mha4mysql-manager-0.56/samples/conf/masterha_default.cnf /etc/

# cp ./mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/materha/

# cp ./mha4mysql-manager-0.56/samples/scripts/* /etc/masterha/

# vim /etc/masterha_default.cnf

[server default]
#数据库账户（需要select、insert、super、reload权限，以及对mysql.user的更改权限）
user=mha
password=123456
#SSH账户
ssh_user=root
#复制账户
repl_user=replication_user
repl_password=123456
ping_interval=1
# master_ip_failover_script= /etc/masterha/master_ip_failover
# master_ip_online_change_script= /etc/masterha/master_ip_online_change
# shutdown_script= /etc/masterha/power_manager
# report_script= /etc/masterha/send_report

# vim /etc/masterha/app1.cnf

[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
remote_workdir=/var/log/

[server1]
hostname=237_12
master_binlog_dir=/data/mysql3306
candidate_master=1
check_repl_delay=0 #防止master故障时，切换时slave有延迟，卡在那里切不过去

[server2]
hostname=237_13
master_binlog_dir=/data/mysql3306
candidate_master=1
check_repl_delay=0

[server3]
hostname=237_14
master_binlog_dir=/data/mysql3306
no_master=1

（5）环境检测
step1

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

提示：All SSH connection tests passed successfully 表示节点间SSH连同

step2

# masterha_check_repl --conf=/etc/masterha/app1.cnf

提示：MySQL Replication Health is OK.

（6）启动MHA Manager

# nohup masterha_manager --conf=/etc/masterha/app1.cnf &

观察日志情况

# tail -f /tmp/mha_manager.log
Thu Aug 10 21:49:09 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Aug 10 21:49:09 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Aug 10 21:49:09 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

查看MHA状态：

# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1716) is running(0:PING_OK), master:237_12

四、故障演练

1、failover故障切换

例：模拟主库（192.168.237.12）MySQL实例挂掉

观察备库237.14的主从复制关系：主库已经切换为237.13

观察此时新的主库的binlog位置点：

查看管理节点的日志记录（位置由配置文件中参数：manager_log决定）

# cat /var/log/masterha/app1/manager.log

From:
237_12(192.168.237.12:3306) (current master)
 +--237_13(192.168.237.13:3306)
 +--237_14(192.168.237.14:3306)

To:
237_13(192.168.237.13:3306) (new master)
 +--237_14(192.168.237.14:3306)

[root@237_11 script]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

需要注意的是，无论宕机导致的master切换、还是没有特别设置地手动切换master（不含orig_master_is_new_slave选项），原来的master都不在MHA架构内了，即使重新启动也不会加入，必须手动加入。配置文件中仍然是以挂掉的master为主库，并没有提升candidated master为主库，此时MHA将不再具备高可用的特性，需尽快更新配置文件，并在管理节点运行masterha_check_ssh确认无误后重新启用MHA manager。

此时，若假设原主库（237.12）上的MySQL已成功修复故障，正常启动后，可以以备库的身份加入MHA架构内：

（1）复制采用GTID方式

直接change master ... MASTER_AUTO_POSITION=1;

（2）binlog+position方式

首先，在MHA管理节点的日志中找到，复制需要的主库binlog位置点：

建立到新主库237.13的复制关系

Tips：该步骤很有可能会报1236的错误，并且Slave_IO_Running状态为 No

解决办法如下（从库切换到新的binlog，并重新指向之前卡在的主库binlog位置点）：

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)
mysql> 
mysql> flush logs;
Query OK, 0 rows affected (0.01 sec)
mysql> 
mysql> show master status \G
*************************** 1. row ***************************
            File: mysql-bin.000035
        Position: 107
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

mysql> change master to master_log_file='mysql-bin.000018',master_log_pos=107;
Query OK, 0 rows affected (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.237.13
                  Master_User: replication_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000018
          Read_Master_Log_Pos: 107
               Relay_Log_File: 237_12-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000018
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 410
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 23713
1 row in set (0.00 sec)

当建立新主从关系后，可以开始修改MHA配置文件（app1.cnf）中的server1、server2的hostname，重新启用新的高可用。启动MHA前需做如下检查：

step1、删除前一次failover标记文件：

# rm -rf app1.failover.complete

step2、检查配置文件正确性：

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

step3、再次启动MHA：
# nohup masterha_manager --conf=/etc/masterha/app1.cnf &

确认mha状态：
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1977) is running(0:PING_OK), master:node2

2、在线手动切换

（online master switch理论写操作的阻塞时间在0.5秒-2秒）

功能：用于调整无法在线变更的参数（例如innodb_buffer_pool_size等）。通过在从库配置文件中调整参数并重启生效后，执行online master switch将从库提升为主库，原主库同样修改参数后重启，并以从库的身份重新加入MHA架构，从而曲线实现了在线变更参数的需求而无需中断业务。

切换前需要先停止MHA manger，否则会有如下报错。

以下为成功切换过程：

[root@237_11 app1]# masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf < /dev/null > /tmp/mha_manager.log 2>&1
[root@237_11 app1]# 
[root@237_11 app1]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=237_12 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Sun Aug 13 21:46:10 2017 - [info] MHA::MasterRotate version 0.56.
Sun Aug 13 21:46:10 2017 - [info] Starting online master switch..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] * Phase 1: Configuration Check Phase..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Aug 13 21:46:10 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Aug 13 21:46:10 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Aug 13 21:46:10 2017 - [info] GTID failover mode = 0
Sun Aug 13 21:46:10 2017 - [info] Current Alive Master: 237_13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Alive Slaves:
Sun Aug 13 21:46:10 2017 - [info]   237_12(192.168.237.12:3306)  Version=5.5.54-log (oldest major version between slaves) log-bin:enabled
Sun Aug 13 21:46:10 2017 - [info]     Replicating from 192.168.237.13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Aug 13 21:46:10 2017 - [info]   237_14(192.168.237.14:3306)  Version=5.5.54-log (oldest major version between slaves) log-bin:enabled
Sun Aug 13 21:46:10 2017 - [info]     Replicating from 192.168.237.13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info]     Not candidate for the new Master (no_master is set)
Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] Checking MHA is not monitoring or doing failover..
Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_12..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_14..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] 237_12 can be new master.
Sun Aug 13 21:46:10 2017 - [info] 
From:
237_13(192.168.237.13:3306) (current master)
 +--237_12(192.168.237.12:3306)
 +--237_14(192.168.237.14:3306)

To:
237_12(192.168.237.12:3306) (new master)
 +--237_14(192.168.237.14:3306)
 +--237_13(192.168.237.13:3306)
Sun Aug 13 21:46:10 2017 - [info] Checking whether 237_12(192.168.237.12:3306) is ok for the new master..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): Resetting slave pointing to the dummy host.
Sun Aug 13 21:46:10 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] * Phase 2: Rejecting updates Phase..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Sun Aug 13 21:46:10 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Aug 13 21:46:10 2017 - [info]  ok.
Sun Aug 13 21:46:10 2017 - [info] Orig master binlog:pos is mysql-bin.000018:107.
Sun Aug 13 21:46:10 2017 - [info]  Waiting to execute all relay logs on 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:10 2017 - [info]  master_pos_wait(mysql-bin.000018:107) completed on 237_12(192.168.237.12:3306). Executed 0 events.
Sun Aug 13 21:46:10 2017 - [info]   done.
Sun Aug 13 21:46:10 2017 - [info] Getting new master's binlog name and position..
Sun Aug 13 21:46:10 2017 - [info]  mysql-bin.000035:107
Sun Aug 13 21:46:10 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='237_12 or 192.168.237.12', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000035', MASTER_LOG_POS=107, MASTER_USER='replication_user', MASTER_PASSWORD='xxx';
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] * Switching slaves in parallel..
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) started, pid: 8397
Sun Aug 13 21:46:10 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info] Log messages from 237_14 ...
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:10 2017 - [info]  Waiting to execute all relay logs on 237_14(192.168.237.14:3306)..
Sun Aug 13 21:46:10 2017 - [info]  master_pos_wait(mysql-bin.000018:107) completed on 237_14(192.168.237.14:3306). Executed 0 events.
Sun Aug 13 21:46:10 2017 - [info]   done.
Sun Aug 13 21:46:10 2017 - [info]  Resetting slave 237_14(192.168.237.14:3306) and starting replication from the new master 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:11 2017 - [info]  Executed CHANGE MASTER.
Sun Aug 13 21:46:11 2017 - [info]  Slave started.
Sun Aug 13 21:46:11 2017 - [info] End of log messages from 237_14 ...
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) succeeded.
Sun Aug 13 21:46:11 2017 - [info] Unlocking all tables on the orig master:
Sun Aug 13 21:46:11 2017 - [info] Executing UNLOCK TABLES..
Sun Aug 13 21:46:11 2017 - [info]  ok.
Sun Aug 13 21:46:11 2017 - [info] Starting orig master as a new slave..
Sun Aug 13 21:46:11 2017 - [info]  Resetting slave 237_13(192.168.237.13:3306) and starting replication from the new master 237_12(192.168.237.12:3306)..
Sun Aug 13 21:46:11 2017 - [info]  Executed CHANGE MASTER.
Sun Aug 13 21:46:11 2017 - [info]  Slave started.
Sun Aug 13 21:46:11 2017 - [info] All new slave servers switched successfully.
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info] * Phase 5: New master cleanup phase..
Sun Aug 13 21:46:11 2017 - [info] 
Sun Aug 13 21:46:11 2017 - [info]  237_12: Resetting slave info succeeded.
Sun Aug 13 21:46:11 2017 - [info] Switching master to 237_12(192.168.237.12:3306) completed successfully.

五、数据库无感访问

1、VIP方式

通过修改脚本：master_ip_failover实现，VIP：192.168.237.50

前提要求：

（1）相应设备不做mac地址绑定；

（2）首次bond vip，需要手工在主库上执行（后续漂移利用master_ip_failover实现）

init_vip.sh脚本如下

#!/bin/bash
key="1"
vip="192.168.47.50/24"
/sbin/ifconfig eth0:$key $vip

（3）candidate_master获取到VIP后，需要如下执行：

# arping -I eth0 -c 3 -s $VIP $GATEWAY >/dev/null

作用是刷新ARP映射表，避免了VIP实际已漂移到从库的情况下，由于缓存的问题导致应用仍然无法访问的情况（已合并到下述脚本中执行）

缺陷：适用于单机单实例

master_ip_failover脚本如下：

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

#自定义该组机器的vip
my $vip = "192.168.237.50";
my $gateway = "192.168.237.2";
my $key = "1";

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    # 此处也可以添加dns、zookeeper、consul等剔除dead master信息
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      &stop_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      print "Creating app user on the new master..\n";
      #FIXME_xxx_create_user( $new_master_handler->{dbh} );
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

#ssh -o ConnectTimeout=15 -o ConnectionAttempts=3，设置SSH等待超时时间15S，重试3次。防止SSH连不上，MHA超长等待。
#arping -I eth0 -c 3 -s $VIP $GATEWAY > /dev/null，刷新arp映射表
sub start_vip() {
    `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" arping -I eth0 -c 3 -s $vip $gateway > /dev/null \"`;
}
sub stop_vip() {
    `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}