MySQL的MMM高可用测试(转自老金)

一、双主零从

==============================================================

主机:

功能IP主机名
monitor10.1.30.216mysql-3
DBdb6Server-id=610.1.30.226net-1
db7Server-id=710.1.30.230net-2

虚拟ip:

IP010.1.30.219(Write)
IP110.1.30.221(Read)
IP210.1.30.243(Read)

——————–[1] mmm_common.conf ——————–

active_master_role          writer 

 

 

<host default>

cluster_interface       eth0            #网卡,如果配置错误mmm会自动修正

 

pid_path                /var/run/mmm_agentd.pid

bin_path                /usr/lib/mysql-mmm/

 

replication_user        repl

replication_password    repl

 

agent_user              mmm_agent

agent_password          mmm_agent

</host>

 

<host db6>

ip                      10.1.30.226

mysql_port              3306

mode                    master

peer                    db7

</host>

 

<host db7>

ip                      10.1.30.230

mysql_port              3306

mode                    master

peer                    db6

</host>

 

<role writer>

hosts                   db6, db7             #能够作为writer的host

ips                     10.1.30.219             #writer的虚拟IP,应用的写请求直接访问这里的ip即可

 

mode                    exclusive              #只有两种模式:exclusive是排他,在这种模式下任何时候只能一个host拥有该角色,balanced模式下可以多个host同时拥有此角色。一般writer是ex,reader是ba

</role>

 

<role reader>

hosts                   db6, db7                #能够作为reader的host

ips                     10.1.30.221, 10.1.30.243 #reader的虚拟IP,应用的读请求直接访问这里的ip即可

mode                    balanced

</role>

 

——————–[2] mmm_mon.conf ———————–

include mmm_common.conf 

 

 

<monitor>

ip                                                     10.1.30.216

pid_path                              /var/run/mmm_mond.pid

bin_path                              /usr/lib/mysql-mmm/

status_path                         /var/lib/misc/mmm_mond.status

ping_ips                               10.1.30.214, 10.1.30.215, 10.1.30.226, 10.1.30.230

</monitor>

 

<host default>

monitor_user                     mmm_mon

monitor_password            mmm_mon

</host>

 

debug 0

 

——————–[4] mmm_agent.conf  ——————–

include mmm_common.conf 

 

this db6

 

 

查看状态:

[root@mysql-3 ~]# mmm_control show 

 

db6(10.1.30.226) master/ONLINE. Roles: reader(10.1.30.221), writer(10.1.30.219)

db7(10.1.30.230) master/ONLINE. Roles: reader(10.1.30.243)

 

[root@mysql-3 ~]# mmm_control checks

db7  ping         [last change: 2010/09/19 17:19:09]  OK

db7  mysql        [last change: 2010/09/19 17:19:09]  OK

db7  rep_threads  [last change: 2010/09/19 17:19:09]  OK

db7  rep_backlog  [last change: 2010/09/19 17:19:09]  OK: Backlog is null

db6  ping         [last change: 2010/09/19 17:19:09]  OK

db6  mysql        [last change: 2010/09/19 17:19:09]  OK

db6  rep_threads  [last change: 2010/09/19 17:24:57]  OK

db6  rep_backlog  [last change: 2010/09/19 17:19:09]  OK: Backlog is null

 

[root@mysql-3 ~]# mmm_control mode

ACTIVE

 

 

 

模拟宕机切换:

随便找一个客户端,执行写操作:

@client 

 

[root@mysql-1 ~]# more /root/script/inserting-into-db.sh

#!/bin/bash

while true;

do

mysql -udba -pdba -h10.1.30.219 -e “insert into jcr.t1 values(null);”

sleep 1 ;

done;

[root@mysql-1 ~]# ./inserting-into-db.sh &

 

 

在两个数据库主机,通过脚本监测产生的binlog日志:

脚本内容

[root@net-1 binlog]# more mysqlbinlog_tail.sh 

 

binlogidx=$1

 

while true;

binlog=`tail -1 $binlogidx`

do

mysqlbinlog $binlog|tail -20

sleep 2;

done;

 

监测效果:

[root@net-1 binlog]# ./mysqlbinlog_tail.sh mysql-bin.index 

 

# at 478124

#100920 10:57:13 server id 6 end_log_pos 478152 Intvar

SET INSERT_ID=5166/*!*/;

# at 478152

#100920 10:57:13 server id 6 end_log_pos 478242 Query       thread_id=43343     exec_time=1         error_code=0

SET TIMESTAMP=1284951433/*!*/;

insert into jcr.t1 values(null)

/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

……

可以看到两个db中的binlog显示的server id都是6,也就是说当前情况下db6是作为写库。

 

停止db6

[root@net-1 ~]# /etc/init.d/mysql stop 

 

Shutting down MySQL.                                       [  OK  ]

 

依次检查mmm各日志:

/var/log/mysql-mmm/mmm_mond.log

2010/09/20 09:49:38  WARN Check ‘rep_threads’ on ‘db6′ is in unknown state! Message: UNKNOWN:Connect error (host = 10.1.30.226:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111 

 

2010/09/20 09:49:38  WARN Check ‘rep_backlog’ on ‘db6′ is in unknown state! Message: UNKNOWN:Connect error (host = 10.1.30.226:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 09:49:48 ERROR Check ‘rep_threads’ on ‘db7′ has failed for 10 seconds! Message: ERROR: Replication is broken

2010/09/20 09:49:50 FATAL State of host ‘db7′ changed from ONLINE to REPLICATION_FAIL

2010/09/20 09:49:50  INFO Removing all roles from host ‘db7′:

2010/09/20 09:49:50  INFO     Removed role ‘reader(10.1.30.243)’ from host ‘db7′

2010/09/20 09:49:50  INFO Orphaned role ‘reader(10.1.30.243)’ has been assigned to ‘db6′

2010/09/20 09:49:52 ERROR Check ‘mysql’ on ‘db6′ has failed for 10 seconds! Message: ERROR: Connect error (host = 10.1.30.226:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 09:49:53 FATAL State of host ‘db6′ changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)

2010/09/20 09:49:53  INFO Removing all roles from host ‘db6′:

2010/09/20 09:49:53  INFO     Removed role ‘reader(10.1.30.243)’ from host ‘db6′

2010/09/20 09:49:53  INFO     Removed role ‘reader(10.1.30.221)’ from host ‘db6′

2010/09/20 09:49:53  INFO    Removed role ‘writer(10.1.30.219)’ from host ‘db6′

2010/09/20 09:49:56 FATAL State of host ‘db7′ changed from REPLICATION_FAIL to ONLINE

2010/09/20 09:49:56  INFO Orphaned role ‘writer(10.1.30.219)’ has been assigned to ‘db7′

2010/09/20 09:49:56  INFO Orphaned role ‘reader(10.1.30.243)’ has been assigned to ‘db7′

2010/09/20 09:49:56  INFO Orphaned role ‘reader(10.1.30.221)’ has been assigned to ‘db7′

 

先熟悉一下db6和db7二者目前的角色:

db6(10.1.30.226)master/ ONLINE.Roles:reader(10.1.30.221), writer(10.1.30.219)
db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.243)

 

 

从日志可以看出,db6停止之后,mmm提示connect error,然后db7的io复制线程失败,检查项rep_threads在10秒后出现报警replication is broken,由于当前的写库是db6,于是mmm认为db7上的数据已经不能和db6保持一致了,故把db7的读角色(reader)迁移到db6上。变成了:

db6(10.1.30.226) master/ONLINE. Roles: reader(10.1.30.221), writer(10.1.30.219), reader(10.1.30.243)

db7(10.1.30.230) master/ REPLICATION_FAIL.

db6(10.1.30.226)master/ ONLINE.Roles: 

 

reader(10.1.30.221),

writer(10.1.30.219),

reader(10.1.30.243)

db7(10.1.30.230)master/ REPLICATION_FAIL. 

 

 

但是,这时mmm的”mysql”检查项在10秒后出现报警,认为db6已经彻底失败,因此又把db6设置状态为hard_offline,把db7从replication_fail状态切换到online状态(因为db7的mysql至少还活着)同时把上面的所有角色切换到db7上。状态最终变为:

db6(10.1.30.226)master/ HARD_OFFLINE. 
db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.221),

writer(10.1.30.219),

reader(10.1.30.243)

疑问:mmm为什么会多此一举,先把因db6宕机引起复制失败的db7的reader角色交给写库db6,然后再把db6的所有角色交给正常的db7?mmm在处理复制失败的时候能不能直接判断写库的mysql状态,如果失败,那就直接由db7接管db6,而不需要再来回切换角色了。

 

 

db6:  /var/log/mysql-mmm/mmm_agentd.log

2010/09/20 09:48:28 FATAL Couldn’t allow writes: ERROR: Can’t connect to MySQL (host = 10.1.30.226:3306, user = mmm_agent)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111 

 

2010/09/20 09:48:30 FATAL Couldn’t allow writes: ERROR: Can’t connect to MySQL (host = 10.1.30.226:3306, user = mmm_agent)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 09:48:33 FATAL Couldn’t allow writes: ERROR: Can’t connect to MySQL (host = 10.1.30.226:3306, user = mmm_agent)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 09:48:36 FATAL Couldn’t allow writes: ERROR: Can’t connect to MySQL (host = 10.1.30.226:3306, user = mmm_agent)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 09:48:39  INFO We have some new roles added or old rules deleted!

2010/09/20 09:48:39  INFO Added:   reader(10.1.30.243)

2010/09/20 09:48:40 FATAL Couldn’t allow writes: ERROR: Can’t connect to MySQL (host = 10.1.30.226:3306, user = mmm_agent)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 09:48:42  INFO We have some new roles added or old rules deleted!

2010/09/20 09:48:42  INFO Deleted: reader(10.1.30.221), reader(10.1.30.243), writer(10.1.30.219)

2010/09/20 09:48:43 FATAL Couldn’t deny writes: ERROR: Can’t connect to MySQL (host = 10.1.30.226:3306, user = mmm_agent)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

 

从db6的agent日志也验证了刚才的分析:先增加reader(243),然后移除所有角色。

 

db7:  /var/log/mysql-mmm/mmm_agentd.log

2010/09/20 09:48:39  INFO We have some new roles added or old rules deleted! 

 

2010/09/20 09:48:39  INFO Deleted: reader(10.1.30.243)

2010/09/20 09:48:45  INFO We have some new roles added or old rules deleted!

2010/09/20 09:48:45  INFO Added:   reader(10.1.30.221), reader(10.1.30.243), writer(10.1.30.219)

 

db7的日志可以看出,db7先删除reader(243),然后mmm把所有角色都添加到了db7下面。

 

 

db6恢复正常

[root@net-1 ~]# /etc/init.d/mysql start 

 

Starting MySQL.                                            [  OK  ]

 

/var/log/mysql-mmm/mmm_mond.log

2010/09/20 09:50:43  INFO Check ‘rep_threads’ on ‘db6′ is ok! 

 

2010/09/20 09:50:43  INFO Check ‘rep_backlog’ on ‘db6′ is ok!

2010/09/20 09:50:47  INFO Check ‘mysql’ on ‘db6′ is ok!

2010/09/20 09:50:48 FATAL State of host ‘db6′ changed from HARD_OFFLINE to AWAITING_RECOVERY

2010/09/20 09:50:50 FATAL State of host ‘db6′ changed from AWAITING_RECOVERY to ONLINE because it was down for only 58 seconds

2010/09/20 09:50:50  INFO Moving role ‘reader(10.1.30.243)’ from host ‘db7′ to host ‘db6′

2010/09/20 09:51:38  INFO Check ‘rep_threads’ on ‘db7′ is ok!

 

db6:  /var/log/mysql-mmm/mmm_agentd.log

2010/09/20 09:49:39  INFO We have some new roles added or old rules deleted! 

 

2010/09/20 09:49:39  INFO Added:   reader(10.1.30.243)

db7:  /var/log/mysql-mmm/mmm_agentd.log

2010/09/20 09:49:39  INFO We have some new roles added or old rules deleted! 

 

2010/09/20 09:49:39  INFO Deleted: reader(10.1.30.243)

Mmm的处理步骤大致是:

db6的“mysql”check恢复正常,然后把db6切换到awaiting_recovery状态。然后mmm判断db6的宕机时间在正常范围内,不属于flapping,因此自动切换为online状态。

把db7中的一个reader角色迁移到db6上。

目前写库是db7。(从binlog中就可以看出server id=7)

注:可以在exclusive 的<role writer>中设置prefer=db6,这样在db6恢复正常之后,就可以再次被切换为写库了。

 

从客户端的监测发现,其中写操作大概停了17秒。

 

 

其他架构:

二、双主多从

 

其中的纯数字是ip的D段值(10.1.30.*)

 

配置:

/etc/mysql-mmm/mmm_common.conf

……<host default>略…… 

 

<host db1>

ip                      10.1.30.214

mysql_port              3306

mode                    slave

</host>

 

<host db6>

ip                      10.1.30.226

mysql_port              3306

mode                    master

peer                    db7

</host>

 

<host db7>

ip                      10.1.30.230

mysql_port              3306

mode                    master

peer                    db6

</host>

 

 

<role writer>

hosts                   db6, db7

ips                     10.1.30.219

mode                    exclusive

prefer                  db6

</role>

 

<role reader>

hosts                   db1, db7

ips                     10.1.30.221, 10.1.30.243

mode                    balanced

</role>

 

目前各自分配的角色:

db2(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.243)

db6(10.1.30.226)master/ONLINE.Roles: 

 

writer(10.1.30.219)

db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.221)

# Role writer is assigned to it’s preferred host db6.

 

 

测试1:db7挂

 

mmm把reader(10.1.30.221) 切换到db2

2010/09/20 17:38:27  WARN Check ‘rep_backlog’ on ‘db7′ is in unknown state! Message: UNKNOWN: Connect error (host = 10.1.30.230:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111 

 

2010/09/20 17:38:27  WARN Check ‘rep_threads’ on ‘db7′ is in unknown state! Message: UNKNOWN: Connect error (host = 10.1.30.230:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 17:38:37 ERROR Check ‘mysql’ on ‘db7′ has failed for 10 seconds! Message: ERROR: Connect error (host = 10.1.30.230:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 17:38:37 ERROR Check ‘rep_threads’ on ‘db6′ has failed for 10 seconds! Message: ERROR: Replication is broken

2010/09/20 17:38:38 FATAL State of host ‘db7′ changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)

2010/09/20 17:38:38  INFO Removing all roles from host ‘db7′:

2010/09/20 17:38:38  INFO     Removed role ‘reader(10.1.30.221)’ from host ‘db7′

2010/09/20 17:38:38  INFO Orphaned role ‘reader(10.1.30.221)’ has been assigned to ‘db1′

 

db1(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.221),

reader(10.1.30.243)

db6(10.1.30.226)master/ONLINE.Roles: 

 

writer(10.1.30.219)

db7(10.1.30.230)master/HARD_OFFLINE.Roles:

 

恢复正常之后:

2010/09/20 17:42:17  INFO Check ‘mysql’ on ‘db7′ is ok! 

 

2010/09/20 17:42:17  INFO Check ‘rep_backlog’ on ‘db7′ is ok!

2010/09/20 17:42:17  INFO Check ‘rep_threads’ on ‘db7′ is ok!

2010/09/20 17:42:19 FATAL State of host ‘db7′ changed from HARD_OFFLINE to AWAITING_RECOVERY

2010/09/20 17:42:28  INFO Check ‘rep_threads’ on ‘db6′ is ok!

 

db1(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.221),

reader(10.1.30.243)

db6(10.1.30.226)master/ONLINE.Roles: 

 

writer(10.1.30.219)

db7(10.1.30.230)master/AWAITING_RECOVERY.Roles:

 

mmm_control set_online db7

db1(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.221)

db6(10.1.30.226)master/ONLINE.Roles: 

 

writer(10.1.30.219)

db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.243)

 

 

测试2:db6挂

mmm把db6拿掉,把写操作转给db7,同时把db1状态改为replication_fail,把db1的读操作切到db7。

这时,mmm检查发现db1的复制又恢复了正常(master已经成功切换到db7),于是又恢复了读访问。

2010/09/20 17:52:44  WARN Check ‘rep_backlog’ on ‘db6′ is in unknown state! Message: UNKNOWN: Connect error (host = 10.1.30.226:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111 

 

2010/09/20 17:52:44  WARN Check ‘rep_threads’ on ‘db6′ is in unknown state! Message: UNKNOWN: Connect error (host = 10.1.30.226:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 17:52:54 ERROR Check ‘mysql’ on ‘db6′ has failed for 10 seconds! Message: ERROR: Connect error (host = 10.1.30.226:3306, user = mmm_mon)! Lost connection to MySQL server at ‘reading initial communication packet’, system error: 111

2010/09/20 17:52:54 ERROR Check ‘rep_threads’ on ‘db7′ has failed for 10 seconds! Message: ERROR: Replication is broken

2010/09/20 17:52:54 ERROR Check ‘rep_threads’ on ‘db1′ has failed for 10 seconds! Message: ERROR: Replication is broken

2010/09/20 17:52:55 FATAL State of host ‘db6′ changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)

2010/09/20 17:52:55  INFO Removing all roles from host ‘db6′:

2010/09/20 17:52:55  INFO     Removed role ‘writer(10.1.30.219)’ from host ‘db6′

2010/09/20 17:52:55  INFO Orphaned role ‘writer(10.1.30.219)’ has been assigned to ‘db7′

2010/09/20 17:52:58 FATAL State of host ‘db1′ changed from ONLINE to REPLICATION_FAIL

2010/09/20 17:52:58  INFO Removing all roles from host ‘db1′:

2010/09/20 17:52:58  INFO     Removed role ‘reader(10.1.30.221)’ from host ‘db1′

2010/09/20 17:52:58  INFO Orphaned role ‘reader(10.1.30.221)’ has been assigned to ‘db7′

2010/09/20 17:52:59  INFO Check ‘rep_threads’ on ‘db1′ is ok!

2010/09/20 17:53:01 FATAL State of host ‘db1′ changed from REPLICATION_FAIL to ONLINE

2010/09/20 17:53:01  INFO Moving role ‘reader(10.1.30.243)’ from host ‘db7′ to host ‘db1′

 

db1(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.243)

db6(10.1.30.226)master/ HARD_OFFLINE.Roles:
db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.221),

writer(10.1.30.219)

 

恢复正常

2010/09/20 18:02:10  INFO Check ‘mysql’ on ‘db6′ is ok! 

 

2010/09/20 18:02:10  INFO Check ‘rep_backlog’ on ‘db6′ is ok!

2010/09/20 18:02:10  INFO Check ‘rep_threads’ on ‘db6′ is ok!

2010/09/20 18:02:13 FATAL State of host ‘db6′ changed from HARD_OFFLINE to AWAITING_RECOVERY

2010/09/20 18:02:45  INFO Check ‘rep_threads’ on ‘db7′ is ok!

 

db1(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.243)

db6(10.1.30.226)master/AWAITING_RECOVERY.Roles:
db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.221),

writer(10.1.30.219)

 

mmm_control set_online db6

2010/09/20 18:05:28 FATAL Admin changed state of ‘db6′ from AWAITING_RECOVERY to ONLINE 

 

2010/09/20 18:05:28  INFO Moving role ‘writer(10.1.30.219)’ from host ‘db7′ to preferred host ‘db6′

 

db1(10.1.30.214)slave/ONLINE.Roles: 

 

reader(10.1.30.243)

db6(10.1.30.226)master/ONLINE.Roles: 

 

writer(10.1.30.219)

db7(10.1.30.230)master/ONLINE.Roles: 

 

reader(10.1.30.221)

 

在整个过程中,db1对应的master从db6切换到db7,然后再由db7切换到db6,而db6失败然后重新打开,在这期间每一秒都有数据写入,最终检查数据一致性状况良好。

[root@mysql-3 ~]# ./check_repl.sh 

 

+———-+

| count(*) |

+———-+

|    30598 |

+———-+

+———-+

| count(*) |

+———-+

|    30598 |

+———-+

+———-+

| count(*) |

+———-+

|    30598 |

+———-+

 

 

 

 

总结

MMM对MySQL复制提供了一种高可用的解决方案。

它包含主机监控、mysql服务监控、复制线程监控等多方位监控手段,在MySQL多主环境中,使用它可以在任何一个节点(master或slave)出现故障时能够及时作出判断和调整。它使用虚拟ip技术使数据库架构内的角色切换对上层的应用程序透明。

 

原文链接:http://www.mysqlops.com/2011/06/11/mysql-mmm-architecture.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值