MHA软件介绍
- MHA是由日本Mysql专家用Perl写的一套Mysql故障切换方案以保障数据库的高可用性,它的功能是能在0一30s之内实现主Mysql故障转移(failover),凡MHA故障转移叮以很好的帮我们解决从库数据的一致性问题.同时最大化挽回故障发生后的数据.MHA里有两个角色一个是node节点一个是manager节点,要实现这个MHA,必须最少要二台数据库服务器,一主多备,即一台充当master,一台充当master的备份机,另外一台是从属机,这里实验为了实现更好的效果使用三台机器,需要说明的是一旦主服务器宕机.备份机即开始充当master提供服务,如果上服务器上线也不会再成为master了.因为如果这样数据库的一致性就被改变了。
- MHA监控复制架构的主服务器,一旦检侧到主服务器故障.就会自动进行故障转移。即使有些从服务器没有收到最新的relay
log自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了 , 期间通常在几秒内完成故障转移,9一12秒可以检测出主服务器故障,7一10秒内关闭故障的主服务器以避免脑裂,几秒中内应用差异的relay10到新的主上, 整个过程在0-30s
内完成。
MHA软件介绍
- MHA是由日本Mysql专家用Perl写的一套Mysql故障切换方案以保障数据库的高可用性,它的功能是能在0一30s之内实现主Mysql故障转移(failover),凡MHA故障转移叮以很好的帮我们解决从库数据的一致性问题.同时最大化挽回故障发生后的数据.MHA里有两个角色一个是node节点一个是manager节点,要实现这个MHA,必须最少要二台数据库服务器,一主多备,即一台充当master,一台充当master的备份机,另外一台是从属机,这里实验为了实现更好的效果使用三台机器,需要说明的是一旦主服务器宕机.备份机即开始充当master提供服务,如果上服务器上线也不会再成为master了.因为如果这样数据库的一致性就被改变了。
- MHA监控复制架构的主服务器,一旦检侧到主服务器故障.就会自动进行故障转移。即使有些从服务器没有收到最新的relay log自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了 , 期间通常在几秒内完成故障转移,9一12秒可以检测出主服务器故障,7一10秒内关闭故障的主服务器以避免脑裂,几秒中内应用差异的relay10到新的主上, 整个过程在0-30s 内完成。
搭建环境
mha-master:4核 2G内存 Redhat 6.5 IP:192.168.86.148 主机名:mha-master 安装MySQL 5.6.36
mha-manager:4核 2G内存 Redhat 6.5 IP:192.168.86.149 主机名:mha-manager 安装MySQL 5.6.36
mha-bak:4核 2G内存 Redhat 6.5 IP:192.168.86.150 主机名:mha-bak 安装MySQL 5.6.36
搭建步骤
一、搭建主从(一主三从)
主库:mha-master
从库:mha-manager、mha-bak
注意:要在mha-master 端新建复制用户并授予复制权限
1
mysql> grant replication slave,replication client on *.* to 'repl'@'192.168.86.%' identified by '123456';
二、安装MHA及基本环境配置
三台机器都需执行1-2步骤
1、在所有的节点上安装MHA node所需的perl模块(DBD-mysql)以及MHA node节点
1
$ yum install perl-DBD-MySQL -y
2、在所有的节点上安装MHA的node数据节点
1
$ rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
2
Preparing... ########################################### [100%]
3
1:mha4mysql-node ########################################### [100%]
安装完成后,看/usr/bin目录下是否生成以下脚本文件:
1
$ ll /usr/bin/ | grep logs
2
-rwxr-xr-x. 1 root root 15977 Dec 1 2012 apply_diff_relay_logs
3
-rwxr-xr-x. 1 root root 4807 Dec 1 2012 filter_mysqlbinlog
4
-rwxr-xr-x. 1 root root 7401 Dec 1 2012 purge_relay_logs
5
-rwxr-xr-x. 1 root root 7263 Dec 1 2012 save_binary_logs
3、在manager节点(mha-manager)上安装MHA manager以及依赖包
1
$ ls
2
perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
3
perl-Email-Date-Format-1.002-5.el6.noarch.rpm
4
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
5
perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
6
perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
7
perl-MailTools-2.04-4.el6.noarch.rpm
8
perl-MIME-Lite-3.027-2.el6.noarch.rpm
9
perl-MIME-Lite-HTML-1.23-2.el6.noarch.rpm
10
perl-MIME-Types-1.28-2.el6.noarch.rpm
11
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
12
perl-Params-Validate-0.92-3.el6.x86_64.rpm
13
perl-TimeDate-1.16-13.el6.noarch.rpm
1
$ yum localinstall perl-* -y
1
$ rpm -ivh mha4mysql-manager-0.55-0.el6.noarch.rpm
2
Preparing... ########################################### [100%]
3
1:mha4mysql-manager ########################################### [100%]
安装成功后,生成以下脚本文件:
1
$ ls /usr/bin/ | grep "masterha_*"
2
masterha_check_repl
3
masterha_check_ssh
4
masterha_check_status
5
masterha_conf_host
6
masterha_manager
7
masterha_master_monitor
8
masterha_master_switch
9
masterha_secondary_check
10
masterha_stop
4、配置主机名字(三台机器都要配置)
1
$ vim /etc/hosts
2
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
3
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
4
192.168.86.148 mha-master
5
192.168.86.149 mha-manager
6
192.168.86.150 mha-bak
5、配置SSH登录无密码验证
方法一:
在mha-master上:
①生成rsa秘钥
1
$ ssh-keygen -t rsa
2
Generating public/private rsa key pair.
3
Enter file in which to save the key (/root/.ssh/id_rsa):
4
Created directory '/root/.ssh'.
5
Enter passphrase (empty for no passphrase):
6
Enter same passphrase again:
7
Your identification has been saved in /root/.ssh/id_rsa.
8
Your public key has been saved in /root/.ssh/id_rsa.pub.
9
The key fingerprint is:
10
5a:26:2d:d2:63:48:52:d1:0f:54:b0:fc:8c:6e:93:9e root@mha-master
11
The key's randomart image is:
12
+--[ RSA 2048]----+
13
| o+oo. |
14
| . .o. |
15
| . . oo |
16
| o o =. |
17
| o B S |
18
| + O |
19
| * |
20
| o o |
21
| E |
22
+-----------------+
②将生成的公钥追加到认证文件中
1
$ cd /root/.ssh/
2
$ cat id_rsa.pub >> authorized_keys
③修改ssh文件夹和公钥的权限
chmod 700 /root/.ssh/
chmod 600 /root/.ssh/authorized_keys
④将公钥传输到mha-manager和mha-bak上
1
$ ssh-copy-id -i id_rsa.pub root@192.168.86.149
2
The authenticity of host '192.168.86.149 (192.168.84.149)' can't be established.
3
RSA key fingerprint is 22:00:37:a2:be:c4:98:12:72:57:d4:74:bb:07:02:50.
4
Are you sure you want to continue connecting (yes/no)? yes
5
Warning: Permanently added '192.168.86.149' (RSA) to the list of known hosts.
6
root@192.168.86.149's password:
7
Now try logging into the machine, with "ssh 'root@192.168.86.149'", and check in:
8
9
.ssh/authorized_keys
10
11
to make sure we haven't added extra keys that you weren't expecting.
1
ssh-copy-id -i id_rsa.pub root@192.168.86.150
2
The authenticity of host '192.168.86.150 (192.168.86.150)' can't be established.
3
RSA key fingerprint is 22:00:37:a2:be:c4:98:12:72:57:d4:74:bb:07:02:50.
4
Are you sure you want to continue connecting (yes/no)? yes
5
Warning: Permanently added '192.168.86.150' (RSA) to the list of known hosts.
6
root@192.168.86.150's password:
7
Now try logging into the machine, with "ssh 'root@192.168.86.150'", and check in:
8
9
.ssh/authorized_keys
10
11
to make sure we haven't added extra keys that you weren't expecting.
在mha-manager和mha-bak上重复①-④步,至三台主机的authorized_keys文件中都包含了三台主机
方法二:
在mha-master上执行以下脚本,并按提示输入密码
1
#!/bin/bash
2
node=( 192.168.86.148 192.168.86.149 192.168.86.150) # hostname for each node
3
username=root # username to be interconnected
4
homename=$username # home dir, i.e. home/zhangyang
5
if [ "$username" = "root" ];
6
then
7
homename=root
8
else
9
homename=home/$username
10
fi
11
12
for((i=0; i<${#node[*]}; i++))
13
do
14
ssh $username@${node[i]} 'ssh-keygen -t rsa; chmod 755 ~/.ssh'
15
done
16
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
17
18
echo "batch authorized_keys created..."
19
echo "start scp..."
20
21
#scp node003:/$homename/.ssh/authorized_keys /$homename/.ssh/node003.key
22
for((i=0; i<${#node[*]}; i++))
23
do
24
scp ${node[i]}:/$homename/.ssh/id_rsa.pub /$homename/.ssh/${node[i]}.key
25
echo "scp from ${node[i]} finished..."
26
done
27
28
echo "append key to authorized_keys..."
29
for((i=0; i<${#node[*]}; i++))
30
do
31
cat /$homename/.ssh/${node[i]}.key >> /$homename/.ssh/authorized_keys
32
echo "append ${node[i]}.key finished..."
33
done
34
35
echo "append all key finished..."
36
loop=${#node[*]}
37
let subloop=loop-1
38
echo "starting scp complete authorized_keys to ${node[0]}~${node[subloop]}"
39
for((i=0; i<${#node[*]}; i++))
40
do
41
scp /$homename/.ssh/authorized_keys ${node[i]}:/$homename/.ssh/authorized_keys
42
echo "scp to ${node[i]} finished..."
43
done
44
echo "scp all nodes finished..."
45
46
# delete intermediate files
47
rm -rf /$homename/.ssh/*.key
48
echo "all configuration finished..."
6、测试三台主机是否实现无密码相互登录
在mha-master上
1
$ ssh mha-manager
2
$ ssh mha-bak
在mha-manager上
1
$ ssh mha-master
2
$ ssh mha-manager
在mha-bak上
1
$ ssh mha-master
2
$ ssh mha-manager
7、在mha-master上创建监控用户,使得两个从库可以连接
1
mysql> grant all privileges on *.* to 'mha_monitor'@'192.168.86.%' identified by '123456';
2
Query OK, 0 rows affected, 1 warning (0.21 sec)
8、再其他两台机器上测试是否可以登录
1
$ mysql -umha_monitor -p123456 -h192.168.86.148
三、配置MHA
在mha-manager上执行
1、创建MHA的工作目录
1
$ mkdir -p /etc/masterha
2、修改相关配置文件/etc/masterha/app1.cnf,内容如下
1
[server default]
2
manager_log=/var/log/masterha/app1/manager.log
3
manager_workdir=/var/log/masterha/app1
4
master_binlog_dir=/data
5
remote_workdir=/tmp
6
ping_interval=1
7
repl_user=repl
8
repl_password=123456
9
repl_user=repl
10
ssh_user=root
11
user=mha_monitor
12
password=123456
13
14
[server1]
15
hostname=192.168.86.148
16
port=3306
17
18
[server2]
19
hostname=192.168.86.149
20
port=3306
21
22
[server3]
23
candidate_master=1
24
check_repl_delay=0
25
hostname=192.168.86.150
26
port=3306
3、在每个slave节点上设置relay_log的清除方式
1
mysql> set global relay_log_purge=0;(在 2 个 slave 上执行)
2
Query OK, 0 rows affected (0.00 sec)
注意:
- MHA 在发生切换的过程中, 从库的恢复过程中依赖于 relay log 的相关信息, 所以这里要将 relay log 的自动清除设置为 OFF, 采用手动清除 relay log 的方式。 在默认情况下, 从服务器上的中继日志会在 SQL 线程执行完毕后被自动删除。 但是在 MHA 环境中, 这些中继日志在恢复其他从服务器时可能会被用到,
因此需要禁用中继日志的自动删除功能。 定期清除中继日志需要考虑到复制延时的问题。 在 ext3 的文件系统下, 删除大的文件需要一定的时间, 会导致严重的复制延时。 为了避免复制延时, 需要暂时为中继日志创建硬链接, 因为在 linux 系统中通过硬链接删除大文件速度会很快。 (在 mysql 数据库中, 删除大表时, 通常也采用建立硬链接的方式)
- MHA 节点中包含了 pure_relay_logs 命令工具, 它可以为中继日志创建硬链接, 执行 SET GLOBAL relay_log_purge=1,等待几秒钟以便 SQL 线程切换到新的中继日志, 再执行 SET GLOBAL relay_log_purge=0。
- 如下脚本可实现清空relay日志
-
1
vim /root/purge_relay_log.sh
23#!/bin/bash
4user=root
5passwd=123456
6port=3306
7log_dir='/var/log/masterha/app1/log'
8work_dir='/var/log/masterha/app1'
9purge='/usr/local/bin/purge_relay_logs'
10if [ ! -d $log_dir ]
11then
12mkdir $log_dir -p
13fi
14$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port
15--workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
将脚本添加至crontab定期执行 - 1
$ crontab -l
20 4 * * * /bin/bash /root/purge_relay_log.sh
-
4、设置VIP(Virtual IP)
1
$ ifconfig eth0:1 192.168.86.155/24
5. 编辑两个IP漂移脚本
failover脚本内容如下,注意修改脚本中的VIP(一处)
1
$ vim /usr/bin/master_ip_failover
2
#!/usr/bin/env perl
3
use strict;
4
use warnings FATAL =>'all';
5
6
use Getopt::Long;
7
8
my (
9
$command, $ssh_user, $orig_master_host, $orig_master_ip,
10
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
11
);
12
13
my $vip = '192.168.86.155/24'; # Virtual IP
14
my $key = "1";
15
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
16
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
17
my $exit_code = 0;
18
19
GetOptions(
20
'command=s' => \$command,
21
'ssh_user=s' => \$ssh_user,
22
'orig_master_host=s' => \$orig_master_host,
23
'orig_master_ip=s' => \$orig_master_ip,
24
'orig_master_port=i' => \$orig_master_port,
25
'new_master_host=s' => \$new_master_host,
26
'new_master_ip=s' => \$new_master_ip,
27
'new_master_port=i' => \$new_master_port,
28
);
29
30
exit &main();
31
32
sub main {
33
34
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
35
36
if ( $command eq "stop" || $command eq "stopssh" ) {
37
38
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
39
# If you manage master ip address at global catalog database,
40
# invalidate orig_master_ip here.
41
my $exit_code = 1;
42
eval {
43
print "\n\n\n***************************************************************\n";
44
print "Disabling the VIP - $vip on old master: $orig_master_host\n";
45
print "***************************************************************\n\n\n\n";
46
&stop_vip();
47
$exit_code = 0;
48
};
49
if ($@) {
50
warn "Got Error: $@\n";
51
exit $exit_code;
52
}
53
exit $exit_code;
54
}
55
elsif ( $command eq "start" ) {
56
57
# all arguments are passed.
58
# If you manage master ip address at global catalog database,
59
# activate new_master_ip here.
60
# You can also grant write access (create user, set read_only=0, etc) here.
61
my $exit_code = 10;
62
eval {
63
print "\n\n\n***************************************************************\n";
64
print "Enabling the VIP - $vip on new master: $new_master_host \n";
65
print "***************************************************************\n\n\n\n";
66
&start_vip();
67
$exit_code = 0;
68
};
69
if ($@) {
70
warn $@;
71
exit $exit_code;
72
}
73
exit $exit_code;
74
}
75
elsif ( $command eq "status" ) {
76
print "Checking the Status of the script.. OK \n";
77
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
78
exit 0;
79
}
80
else {
81
&usage();
82
exit 1;
83
}
84
}
85
86
# A simple system call that enable the VIP on the new master
87
sub start_vip() {
88
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
89
}
90
# A simple system call that disable the VIP on the old_master
91
sub stop_vip() {
92
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
93
}
94
95
sub usage {
96
print
97
"Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=po
98
rt –new_master_host=host –new_master_ip=ip –new_master_port=port\n";
99
}
online
change脚本内容如下,注意修改脚本中的VIP(一处)
1
#/bin/bash
2
source /root/.bash_profile
3
vip=`echo '192.168.86.155/24'` # Virtual IP
4
key=`echo '1'`
5
6
command=`echo "$1" | awk -F = '{print $2}'`
7
orig_master_host=`echo "$2" | awk -F = '{print $2}'`
8
new_master_host=`echo "$7" | awk -F = '{print $2}'`
9
orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'`
10
new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'`
11
12
stop_vip=`echo "ssh root@$orig_master_host /sbin/ifconfig eth0:$key down"`
13
start_vip=`echo "ssh root@$new_master_host /sbin/ifconfig eth0:$key $vip"`
14
15
if [ $command = 'stop' ]
16
then
17
echo -e "\n\n\n***************************************************************\n"
18
echo -e "Disabling the VIP - $vip on old master: $orig_master_host\n"
19
$stop_vip
20
if [ $? -eq 0 ]
21
then
22
echo "Disabled the VIP successfully"
23
else
24
echo "Disabled the VIP failed"
25
fi
26
echo -e "***************************************************************\n\n\n\n"
27
fi
28
if [ $command = 'start' -o $command = 'status' ]
29
then
30
echo -e "\n\n\n***************************************************************\n"
31
echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"
32
$start_vip
33
if [ $? -eq 0 ]
34
then
35
echo "Enabled the VIP successfully"
36
else
37
echo "Enabled the VIP failed"
38
fi
39
echo -e "***************************************************************\n\n\n\n"
40
fi
对脚本授权
1
$ chmod +x /usr/bin/master_ip_failover /usr/bin/master_ip_online_change
四、MHA常用操作
1、执行SSH检查
1
$ masterha_check_ssh --conf=/etc/masterha/app1.cnf
2
Fri Sep 1 01:31:11 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
3
Fri Sep 1 01:31:11 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
4
Fri Sep 1 01:31:11 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
5
Fri Sep 1 01:31:11 2017 - [info] Starting SSH connection tests..
6
Fri Sep 1 01:31:12 2017 - [debug]
7
Fri Sep 1 01:31:11 2017 - [debug] Connecting via SSH from root@192.168.86.148(192.168.86.148:22) to root@192.168.86.149(192.168.86.149:22)..
8
Fri Sep 1 01:31:11 2017 - [debug] ok.
9
Fri Sep 1 01:31:11 2017 - [debug] Connecting via SSH from root@192.168.86.148(192.168.86.148:22) to root@192.168.86.150(192.168.86.150:22)..
10
Fri Sep 1 01:31:11 2017 - [debug] ok.
11
Fri Sep 1 01:31:12 2017 - [debug]
12
Fri Sep 1 01:31:11 2017 - [debug] Connecting via SSH from root@192.168.86.149(192.168.86.149:22) to root@192.168.86.148(192.168.86.148:22)..
13
Fri Sep 1 01:31:11 2017 - [debug] ok.
14
Fri Sep 1 01:31:11 2017 - [debug] Connecting via SSH from root@192.168.86.149(192.168.86.149:22) to root@192.168.86.150(192.168.86.150:22)..
15
Fri Sep 1 01:31:12 2017 - [debug] ok.
16
Fri Sep 1 01:31:12 2017 - [debug]
17
Fri Sep 1 01:31:12 2017 - [debug] Connecting via SSH from root@192.168.86.150(192.168.86.150:22) to root@192.168.86.148(192.168.86.148:22)..
18
Fri Sep 1 01:31:12 2017 - [debug] ok.
19
Fri Sep 1 01:31:12 2017 - [debug] Connecting via SSH from root@192.168.86.150(192.168.86.150:22) to root@192.168.86.149(192.168.86.149:22)..
20
Fri Sep 1 01:31:12 2017 - [debug] ok.
21
Fri Sep 1 01:31:12 2017 - [info] All SSH connection tests passed successfully.
出现All SSH connection tests passed successfully说明ssh检查通过
2、执行复制状态检查
1
$ masterha_check_repl --conf=/etc/masterha/app1.cnf
2
Fri Sep 1 01:36:29 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
3
Fri Sep 1 01:36:29 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
4
Fri Sep 1 01:36:29 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
5
Fri Sep 1 01:36:29 2017 - [info] MHA::MasterMonitor version 0.56.
6
Fri Sep 1 01:36:29 2017 - [info] GTID failover mode = 0
7
Fri Sep 1 01:36:29 2017 - [info] Dead Servers:
8
Fri Sep 1 01:36:29 2017 - [info] Alive Servers:
9
Fri Sep 1 01:36:29 2017 - [info] 192.168.86.148(192.168.86.148:3306)
10
Fri Sep 1 01:36:29 2017 - [info] 192.168.86.149(192.168.86.149:3306)
11
Fri Sep 1 01:36:29 2017 - [info] 192.168.86.150(192.168.86.150:3306)
12
Fri Sep 1 01:36:29 2017 - [info] Alive Slaves:
13
Fri Sep 1 01:36:29 2017 - [info] 192.168.86.149(192.168.86.149:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
14
Fri Sep 1 01:36:29 2017 - [info] Replicating from 192.168.86.148(192.168.86.148:3306)
15
Fri Sep 1 01:36:29 2017 - [info] 192.168.86.150(192.168.86.150:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
16
Fri Sep 1 01:36:29 2017 - [info] Replicating from 192.168.86.148(192.168.86.148:3306)
17
Fri Sep 1 01:36:29 2017 - [info] Primary candidate for the new Master (candidate_master is set)
18
Fri Sep 1 01:36:29 2017 - [info] Current Alive Master: 192.168.86.148(192.168.86.148:3306)
19
Fri Sep 1 01:36:29 2017 - [info] Checking slave configurations..
20
Fri Sep 1 01:36:29 2017 - [info] read_only=1 is not set on slave 192.168.86.150(192.168.86.150:3306).
21
Fri Sep 1 01:36:29 2017 - [info] Checking replication filtering settings..
22
Fri Sep 1 01:36:29 2017 - [info] binlog_do_db= , binlog_ignore_db=
23
Fri Sep 1 01:36:29 2017 - [info] Replication filtering check ok.
24
Fri Sep 1 01:36:29 2017 - [info] GTID (with auto-pos) is not supported
25
Fri Sep 1 01:36:29 2017 - [info] Starting SSH connection tests..
26
Fri Sep 1 01:36:31 2017 - [info] All SSH connection tests passed successfully.
27
Fri Sep 1 01:36:31 2017 - [info] Checking MHA Node version..
28
Fri Sep 1 01:36:31 2017 - [info] Version check ok.
29
Fri Sep 1 01:36:31 2017 - [info] Checking SSH publickey authentication settings on the current master..
30
Fri Sep 1 01:36:31 2017 - [info] HealthCheck: SSH to 192.168.86.148 is reachable.
31
Fri Sep 1 01:36:32 2017 - [info] Master MHA Node version is 0.56.
32
Fri Sep 1 01:36:32 2017 - [info] Checking recovery script configurations on 192.168.86.148(192.168.86.148:3306)..
33
Fri Sep 1 01:36:32 2017 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-server.000007
34
Fri Sep 1 01:36:32 2017 - [info] Connecting to root@192.168.86.148(192.168.86.148:22)..
35
Creating /tmp if not exists.. ok.
36
Checking output directory is accessible or not..
37
ok.
38
Binlog found at /data/, up to mysql-server.000007
39
Fri Sep 1 01:36:32 2017 - [info] Binlog setting check done.
40
Fri Sep 1 01:36:32 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
41
Fri Sep 1 01:36:32 2017 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.86.149 --slave_ip=192.168.86.149 --slave_port=3306 --workdir=/tmp --target_version=5.6.20-log --manager_version=0.56 --relay_log_info=/data/relay-log.info --relay_dir=/data/ --slave_pass=xxx
42
Fri Sep 1 01:36:32 2017 - [info] Connecting to root@192.168.86.149(192.168.86.149:22)..
43
Checking slave recovery environment settings..
44
Opening /data/relay-log.info ... ok.
45
Relay log found at /data, up to mysqld-relay-bin.000003
46
Temporary relay log file is /data/mysqld-relay-bin.000003
47
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
48
done.
49
Testing mysqlbinlog output.. done.
50
Cleaning up test file(s).. done.
51
Fri Sep 1 01:36:32 2017 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.86.150 --slave_ip=192.168.86.150 --slave_port=3306 --workdir=/tmp --target_version=5.6.20-log --manager_version=0.56 --relay_log_info=/data/relay-log.info --relay_dir=/data/ --slave_pass=xxx
52
Fri Sep 1 01:36:32 2017 - [info] Connecting to root@192.168.86.150(192.168.86.150:22)..
53
Checking slave recovery environment settings..
54
Opening /data/relay-log.info ... ok.
55
Relay log found at /data, up to mysqld-relay-bin.000003
56
Temporary relay log file is /data/mysqld-relay-bin.000003
57
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
58
done.
59
Testing mysqlbinlog output.. done.
60
Cleaning up test file(s).. done.
61
Fri Sep 1 01:36:33 2017 - [info] Slaves settings check done.
62
Fri Sep 1 01:36:33 2017 - [info]
63
192.168.86.148(192.168.86.148:3306) (current master)
64
+--192.168.86.149(192.168.86.149:3306)
65
+--192.168.86.150(192.168.86.150:3306)
66
67
Fri Sep 1 01:36:33 2017 - [info] Checking replication health on 192.168.86.149..
68
Fri Sep 1 01:36:33 2017 - [info] ok.
69
Fri Sep 1 01:36:33 2017 - [info] Checking replication health on 192.168.86.150..
70
Fri Sep 1 01:36:33 2017 - [info] ok.
71
Fri Sep 1 01:36:33 2017 - [info] Checking master_ip_failover_script status:
72
Fri Sep 1 01:36:33 2017 - [info] /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.86.148 --orig_master_ip=192.168.86.148 --orig_master_port=3306
73
Checking the Status of the script.. OK
74
Fri Sep 1 01:36:33 2017 - [info] OK.
75
Fri Sep 1 01:36:33 2017 - [warning] shutdown_script is not defined.
76
Fri Sep 1 01:36:33 2017 - [info] Got exit code 0 (Not master dead).
77
78
MySQL Replication Health is OK.
出现MySQL Replication Health is OK说明复制状态检查通过
3、执行manager节点状态检查
执行检查时会出现如下三种情况
①manager服务未启动,会出现OT_RUNNING
1
$ masterha_check_status --conf=/etc/masterha/app1.cnf
2
app1 is stopped(2:NOT_RUNNING).
②manager服务已启动,会出现master的ip地址
1
$ masterha_check_status --conf=/etc/masterha/app1.cnf
2
app1 (pid:15469) is running(0:PING_OK), master:192.168.86.148
③manager服务正在启动
1
$ masterha_check_status --conf=/etc/masterha/app1.cnf
2
app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
4、启动MHA manager服务
1
$ nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master --ignore_last_failover </dev/null > /masterha/app1/manager.log 2>&1 &
5、关闭MHA manager服务
1
$ masterha_stop --conf=/etc/masterha/app1.cnf
6、在配置文件中添加参数
1
$ masterha_conf_host --command=add --conf=/etc/masterha/app1.cnf --hostname=192.168.68.102 --block=server1
7、执行主库在线切换
1
$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.86.151 --orig_master_is_new_slave --running_updates_limit=10000;
8、借助MHA日志完成对原master的修复
在 mha-manager 监控机器上查看当时的 change master 信息如下,可以通过日志中给出的change master命令修复原master
1
$ cat /var/log/masterha/app1/manager.log |grep -i "All other slaves should start"
2
Fri Dec 30 18:09:35 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER
3
TO MASTER_HOST='192.168.86.150', MASTER_PORT=3306, MASTER_LOG_FILE='mha-server.000001', MASTER_LOG_POS=154,
4
MASTER_USER='repl', MASTER_PASSWORD='xxx';
五、使用场景
1、自动Failover(需启动MHA manager)
a.启动manager服务
1
$ nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master --ignore_last_failover </dev/null > /masterha/app1/manager.log 2>&1 &
2
$ masterha_check_status --conf=/etc/masterha/app1.cnf
3
app1 (pid:15469) is running(0:PING_OK), master:192.168.86.148
b.模拟主库故障,将主库的MySQL进程杀死,进行自动failover操作
在mha-master上执行:
1
$ pkill -9 mysqld
c.查看manager上的MHA切换日志,了解整个切换过程
1
$ cat /masterha/app1/manager.log
整个MHA的切换过程,共包括以下的步骤:
- 1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置
- 2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作(这个我这里还没有实现,需要研究)
- 3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
- 4.识别含有最新更新的slave
- 5.应用从master保存的二进制日志事件(binlog events)
- 6.提升一个slave为新的master进行复制
- 7.使其他的slave连接新的master进行复制
d.启动MHA Manger服务,查看集群中现在的master(自动切换后manager服务会停止)
1
$ nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master --ignore_last_failover </dev/null > /masterha/app1/manager.log 2>&1 &
2
$ masterha_check_status --conf=/etc/masterha/app1.cnf
3
app1 (pid:15469) is running(0:PING_OK), master:192.168.86.150
2、手动failover(MHA manager未启动)
当主服务器故障时,可以手动进行故障切换操作
a.执行手动在线切换命令
1
$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.86.148 --dead_master_port=3306 --new_master_host=192.168.86.150 --new_master_port=3306 --ignore_last_failover
b.根据提示输入进行切换
3、修复宕机的master(或在线切换master)
通常情况下自动切换以后, 原master可能已经废弃掉, 待原master主机修复后, 如果数据完整的情况下, 可能想把原来master重新作为
新主库的slave 这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。
新主库的slave 这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。
a.在mha-manager监控机器上查看当时切换日志中的 change master 信息
可以通过日志中给出的change master命令使原master加入集群
1
$ cat /var/log/masterha/app1/manager.log |grep -i "All other slaves should start"
2
Fri Dec 30 18:09:35 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER
3
TO MASTER_HOST='192.168.86.150', MASTER_PORT=3306, MASTER_LOG_FILE='mha-server.000001', MASTER_LOG_POS=154,
4
MASTER_USER='repl', MASTER_PASSWORD='xxx';
使原master重新加入集群
1
mysql> CHANGE MASTER
2
TO MASTER_HOST='192.168.86.150', MASTER_PORT=3306, MASTER_LOG_FILE='mha-server.000001', MASTER_LOG_POS=154,
3
MASTER_USER='repl', MASTER_PASSWORD='123456';
c.在mha-manager上在线切换master
使原master重新成为主库
1
$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.86.148 --orig_master_is_new_slave --running_updates_limit=10000;