MySQL Master-Master Replication Manager(MySQL主主复制管理器),用来监控和故障切换,管理mysql Master-Master复制的配置 (同一时间只有一个节点是可写的),附带的工具套件可以实现多个slaves的read负载均衡,因此你可以使用这个工具移除一组服务器中复制延迟较高的服务器的虚拟IP,它还可以实现数据备份、节点之间重新同步功能等。
MySQL本身没有提供replication failover的解决方案,通过MMM方案能实现服务器的故障转移,从而实现mysql的高可用。
MMM项目来自 https://launchpad.net/mysql-mmm
官方网站为:http://mysql-mmm.org
MMM主要的功能通过下面三个脚本来实现:
- mmm_mond: 负责所有的监控工作的监控守护进程,决定节点的活动
- mmm_agentd:运行在mysql服务器上的代理守护进程,通过远程服务提供给监控节点
- mmm_control:通过命令行管理mmm_mond进程
二、配置安装MMM
1. 安装介绍
function | ip | hostname | server id | role |
---|---|---|---|---|
master1 | 192.168.90.128 | db1 | 13 | writer |
master2 | 192.168.90.129 | db2 | 23 | reader |
slave1 | 192.168.90.130 | db3 | 33 | reader |
monitoring | 192.168.90.130 | mon | - | mmm_mon |
vip | role | description |
---|---|---|
192.168.90.228 | writer | 应用程序连接此IP写入数据 |
192.168.90.229 | reader | 应用程序连接此IP读取数据 |
192.168.90.230 | reader | 应用程序连接此IP读取数据 |
192.168.90.231 | reader | 应用程序连接此IP读取数据 |
2. 搭建主主复制
3. 创建监控用户
[db1 db2 db3]
mysql> GRANT REPLICATION CLIENT ON *.* TO 'mmm_monitor'@'%' IDENTIFIED BY 'mypassword';
mysql> GRANT SUPER, REPLICATION CLIENT, PROCESS ON *.* TO 'mmm_agent'@'%' IDENTIFIED BY 'mypassword';
mysql> flush privileges;
4. 配置参数文件
shell> tar zxvf mysql-mmm-2.2.1.tar.gz
shell> cd mysql-mmm-2.2.1
shell> make install
依赖perl模块:
--Algorithm::Diff
--Class:Singleton
--DBI and DBD::mysql
--File::Basename
--File::stat
--File::Temp
--Log::Dispatch
--Log::Log4perl
--Mail::Send
--Net::Ping
--Net::ARP
--Proc::Daemon
--Thread::Queue
--Time::HiRes
配置agent参数文件
[db1]
shell> vi /etc/mysql-mmm/mmm_agent.conf
=================================================================================
active_master_role writer
<host default>
cluster_interface eth0
pid_path /var/run/mmm_agentd.pid
bin_path /usr/lib/mysql-mmm/
replication_user replication
replication_password mypassword
agent_user mmm_agent
agent_password mypassword
</host>
<host db1>
ip 192.168.90.128
mode master
peer db2
</host>
<host db2>
ip 192.168.90.129
mode master
peer db1
</host>
<host db3>
ip 192.168.90.130
mode slave
</host>
<role writer>
hosts db1, db2
ips 192.168.90.228
mode exclusive
</role>
<role reader>
hosts db1, db2, db3
ips 192.168.90.229, 192.168.90.230, 192.168.90.231
mode balanced
</role>
=================================================================================
将/etc/mysql-mmm/mmm_agent.conf配置文件复制到其他三台主机同样位置
shell> more /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this db1
[db2]
shell> more /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this db2
[db3]
shell> more /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this db3
配置monitor参数文件
[mon]
shell> vi /etc/mysql-mmm/mmm_mon.conf
=================================================================================
include mmm_common.conf
<monitor>
ip 192.168.90.130
pid_path /var/run/mmm_mond.pid
bin_path /usr/lib/mysql-mmm/
status_path /var/lib/misc/mmm_mond.status
ping_ips 192.168.90.128, 192.168.90.129, 192.168.90.130
</monitor>
<host default>
monitor_user mmm_monitor
monitor_password mypassword
</host>
debug 0
=================================================================================
5. 启动mmm_agent
shell> /etc/init.d/mysql-mmm-agent start
Daemon bin: '/usr/sbin/mmm_agentd'
Daemon pid: '/var/run/mmm_agentd.pid'
Starting MMM Agent daemon... Ok
shell> ps -ef |grep mmm_agent
root 4046 1 0 20:07 ? 00:00:00 mmm_agentd
root 4047 4046 0 20:07 ? 00:00:00 mmm_agentd
root 4051 4014 0 20:08 pts/2 00:00:00 grep mmm_agent
shell> netstat -tulnp | grep mmm
tcp 0 0 192.168.90.128:9989 0.0.0.0:* LISTEN 4047/mmm_agentd
shell> /etc/init.d/mysql-mmm-agent status
Daemon bin: '/usr/sbin/mmm_agentd'
Daemon pid: '/var/run/mmm_agentd.pid'
Checking MMM Agent process: running.
shell> /etc/init.d/mysql-mmm-agent --help
Daemon bin: '/usr/sbin/mmm_agentd'
Daemon pid: '/var/run/mmm_agentd.pid'
Usage: /etc/init.d/mysql-mmm-agent {start|stop|restart|status}
6. 启动mmm_monitor
shell> /etc/init.d/mysql-mmm-monitor start
Daemon bin: '/usr/sbin/mmm_mond'
Daemon pid: '/var/run/mmm_mond.pid'
Starting MMM Monitor daemon: Ok
shell> ps -ef |grep mmm
root 4150 1 0 21:13 ? 00:00:00 mmm_mond
root 4151 4150 5 21:13 ? 00:00:00 mmm_mond
root 4161 4151 3 21:13 ? 00:00:00 perl /usr/lib/mysql-mmm//monitor/checker ping_ip
root 4164 4151 4 21:13 ? 00:00:00 perl /usr/lib/mysql-mmm//monitor/checker mysql
root 4166 4151 3 21:13 ? 00:00:00 perl /usr/lib/mysql-mmm//monitor/checker ping
root 4168 4151 4 21:13 ? 00:00:00 perl /usr/lib/mysql-mmm//monitor/checker rep_backlog
root 4171 4151 3 21:13 ? 00:00:00 perl /usr/lib/mysql-mmm//monitor/checker rep_threads
root 4175 1823 0 21:13 pts/1 00:00:00 grep mmm
shell> /etc/init.d/mysql-mmm-monitor stop
Daemon bin: '/usr/sbin/mmm_mond'
Daemon pid: '/var/run/mmm_mond.pid'
Shutting down MMM Monitor daemon: ... Ok
7. 管理mmm_monitor
shell> mmm_control mode
ACTIVE
shell> mmm_control show
db1(192.168.90.128) master/AWAITING_RECOVERY. Roles:
db2(192.168.90.129) master/AWAITING_RECOVERY. Roles:
db3(192.168.90.130) slave/AWAITING_RECOVERY. Roles:
shell> mmm_control set_online db1
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!
shell> mmm_control set_online db2
OK: State of 'db2' changed to ONLINE. Now you can wait some time and check its new roles!
shell> mmm_control set_online db3
OK: State of 'db3' changed to ONLINE. Now you can wait some time and check its new roles!
shell> mmm_control show
db1(192.168.90.128) master/ONLINE. Roles: reader(192.168.90.230), writer(192.168.90.228)
db2(192.168.90.129) master/ONLINE. Roles: reader(192.168.90.231)
db3(192.168.90.130) slave/ONLINE. Roles: reader(192.168.90.229)
shell> mmm_control checks all
db2 ping [last change: 2015/01/27 21:20:06] OK
db2 mysql [last change: 2015/01/27 21:24:30] OK
db2 rep_threads [last change: 2015/01/27 21:24:30] OK
db2 rep_backlog [last change: 2015/01/27 21:24:30] OK: Backlog is null
db3 ping [last change: 2015/01/27 21:20:06] OK
db3 mysql [last change: 2015/01/27 21:24:30] OK
db3 rep_threads [last change: 2015/01/27 21:24:30] OK
db3 rep_backlog [last change: 2015/01/27 21:24:30] OK: Backlog is null
db1 ping [last change: 2015/01/27 21:20:06] OK
db1 mysql [last change: 2015/01/27 21:20:06] OK
db1 rep_threads [last change: 2015/01/27 21:20:06] OK
db1 rep_backlog [last change: 2015/01/27 21:20:06] OK: Backlog is null
8. 测试验证
一般读写验证
shell> mmm_control show
db1(192.168.90.128) master/ONLINE. Roles: reader(192.168.90.229), writer(192.168.90.228)
db2(192.168.90.129) master/ONLINE. Roles: reader(192.168.90.231)
db3(192.168.90.130) slave/ONLINE. Roles: reader(192.168.90.230)
shell> mysql -h 192.168.90.229 -u root -p
mysql> select @@server_id;
+-------------+
| @@server_id |
+-------------+
| 23 |
+-------------+
1 row in set (0.00 sec)
mysql> create table mm (id int primary key auto_increment,name varchar(10));
Query OK, 0 rows affected (0.07 sec)
mysql> insert into db_test.mm values (null,@@server_id);
Query OK, 1 row affected (0.05 sec)
mysql> select * from mm;
+----+------+
| id | name |
+----+------+
| 1 | 23 |
+----+------+
1 row in set (0.01 sec)
查看db1、db2、db3
mysql> select * from mm;
+----+------+
| id | name |
+----+------+
| 1 | 23 |
+----+------+
1 row in set (0.01 sec)
写入故障切换
[db1]
shell> mysqladmin -u root -p shutdown
shell> mmm_control show
db1(192.168.90.128) master/HARD_OFFLINE. Roles:
db2(192.168.90.129) master/ONLINE. Roles: reader(192.168.90.231), writer(192.168.90.228)
db3(192.168.90.130) slave/ONLINE. Roles: reader(192.168.90.229), reader(192.168.90.230)
超过60秒之后重新启动db1的mysql服务
shell> mmm_control show
db1(192.168.90.128) master/AWAITING_RECOVERY. Roles:
db2(192.168.90.129) master/ONLINE. Roles: reader(192.168.90.231), writer(192.168.90.228)
db3(192.168.90.130) slave/ONLINE. Roles: reader(192.168.90.229), reader(192.168.90.230)
shell> mmm_control set_online db1
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!
shell> mmm_control show
db1(192.168.90.128) master/ONLINE. Roles: reader(192.168.90.229)
db2(192.168.90.129) master/ONLINE. Roles: reader(192.168.90.231), writer(192.168.90.228)
db3(192.168.90.130) slave/ONLINE. Roles: reader(192.168.90.230)
shell> mmm_control set_offline db2
OK: State of 'db2' changed to ADMIN_OFFLINE. Now you can wait some time and check all roles!
shell> mmm_control show
db1(192.168.90.128) master/ONLINE. Roles: reader(192.168.90.229), writer(192.168.90.228)
db2(192.168.90.129) master/ADMIN_OFFLINE. Roles:
db3(192.168.90.130) slave/ONLINE. Roles: reader(192.168.90.230), reader(192.168.90.231)
三、mmm_mond 介绍
STATES:
- ONLINE 主机正在正常运行
- ADMIN_OFFINE 手动被设置成离线状态
- HARD_OFFLINE 自动被设置成离线状态(ping检测或者是mysql检测失败)
- AWAITING_RECOVERY 主机等待恢复
- REPLICATION_DELAY 复制延迟大(rep_backlog检测失败)
- REPLICATION_FAIL 复制线程没有运行(rep_threads检测失败)
- 只有当主机ONLINE时才能获得角色(reader or writer),当一个主机从ONLINE转换到其它状态,它所获得的角色都会被移除。
- 主机在REPLICATION_DELAY或REPLICATION_FAIL状态时,如果所有的问题都解决后就会被再次切换到ONLINE状态,除非主机Flapping
- 主机在HARD_OFFLINE状态,如果所有的问题都解决后会转换到AWAITING_RECOVERY状态,如果它故障时间小于60s,并且它没有重启或者auto_set_online > 0,那么它将会被自动切换到ONLINE状态,除非主机Flapping
- 如果活动的主库出现复制延迟或复制失败将不被视为问题,所以活动的主库不会被置为REPLICATION_DELAY或REPLICATION_FAIL状态
- 在节点被转换到ONLINE状态60s内,如果出现复制失败或复制延迟将会被忽略(时间通过master-connect-retry来控制)
- 如果rep_backlog和rep_threads都检测失败,将会切换到REPLICATION_FAIL状态
ROLES:
- exclusive roles
互斥角色只有一个ip,并且同一时间只能分配给一个主机,你可以指定一个优先(preferred)主机,如果这个主机是ONLINE状态,那么角色就会被切换到这个主机。 - balanced roles
负载均衡角色可以有多个IP,这些IP被均衡的分配给多个主机,所以没有一个主机可以比其他主机多出两个角色。可以通过给可写的active_master_role设置mode=slave把角色切换给其他master
STATUS FILE
保存host的角色和状态信息的文件”status_path”
CHECKS
mmm_mond会对每个主机执行4项检查,并决定检查是否成功:
ping 主机可以ping
mysql mysql server在线
rep_threads 复制线程在运行
rep_backlog 复制积压的日志不大
NETWORK CHECK
mmm_mond可以检查到没有功能的网络连接,通过定期ping配置的ips,至少有一个IP是可以达到的,network 就被认为可用。
如果在网络不可用的情况下,mmm_mond会做以下操作:
- 忽略失败检测
- 不改变主机状态
- 不给agents发送任何信息
如果网络不可用的情况下启动MMM,mmm_mond将推迟到网络可用的时候再启动。
FLAPPING
mmm_mond 支持“flapping”(抖动)检测,即如果主机频繁从ONLINE切换到HARD_OFFLINE|REPLICATION_FAIL|REPLICATION_DELAY状态并切换回ONLINE状态(由于auto_set_online或者由于down的时间小于60s),将会导致角色的切换非常频繁。
为了避免这种情况mmm_mond内建了flap检测,可通过配置文件配置。如果主机在flap_duration时间内宕掉了flap_count次,就认为主机flapping将不会自动被设置为ONLINE,它将一直处于AWAITING_RECOVERY状态除非手动设置online(mmm_control set online host)。如果auto_set_online>0,处于flapping的主机在flap_duration时间后将自动被设置为ONLINE状态
MODES
- ACTIVE MODE
Monitor将会自动的把角色从失败的主机上移除,并切换到其他主机上。 - MANUAL MODE
Moniter会自动把负载均衡的角色分配给对应主机,但是不会自动的把角色从失败的主机上移除。可以通过move_role来手工移除。 - WAIT MODE
类似manual模式,但是当两个master都是online状态或者超过了wait_for_other_master的时间,将被改变为ACTIVE模式。 - PASSIVE MODE
在被动模式下,monitor不改变roles、更新状态文件也不会发送任何信息给agents。在被动模式下你可以使用set_ip来改变roles,但是这些改变在monitor切换到ACTIVE或者MANUAL模式(set_active or set_manual)前是不会生效,在启动的roles发生冲突将会进入被动模式
STARTUP
- 初始化网络检查
- 如果网络阻塞,那么Startup将被延迟到网络畅通的时候再运行
- 初始化主机检查
- 读取状态信息:
status file、agents(mysql节点的信息)、hosts(节点的系统信息)
尝试计算出集群的状态信息
ROLE TRANSITION
-
STANDARD ROLE
虚拟IP从原来的主机上移除
虚拟IP被配置到新主机上
新主机发送arp包,通知其他主机它的新IP -
ACTIVE MASTER ROLE
writer role从原来的主机上移除:
1、 mysql设置为read_only模式
2、 删除活动的连接
3、 移除虚拟IP
slaves 被通知:
1、 完成原来主机上的复制工作
2、 切换master到新的主机上
添加writer role到新主机上:
1、 mysql设置为可写模式
2、 配置虚拟IP
KILL HOST FUNCTIONALITY
当主机出现故障,代理无法访问时,监控器执行”kill host”。
四、mmm_control简介
可以通过mmm_control命令控制monitor守护进程,如果你有多个MMM集群,需要指定集群名,如果只有单个MMM集群,那么就可以忽略集群名。
shell> mmm_control help
shell> mmm_control ping #ping monitor的进程,确定进程是否存在
shell> mmm_control show #查看集群状态
shell> mmm_control CHECKS [HOST|ALL [CHECK|ALL]] #查看指定node或者所有主机(all)的指定状态或者所有状态
shell> mmm_control SET_ONLINE HOST #从AWAITING_RECOVERY或ADMIN_OFFLINE恢复到ONLINE
shell> mmm_control SET_OFFLINE HOST #将node手动的置为ADMIN_OFFLINE状态
shell> mmm_control MODE #打印出当前monitor的运行模式
shell> mmm_control set_active #切换monitor进程的为ACTIVE模式
shell> mmm_control set_manual #切换monitor进程的为MANUAL模式
shell> mmm_control set_passive #切换monitor进程的为PASSIVE模式
shell> mmm_control MOVE_ROLE role host #在集群nodes之间切换独占角色,在passive模式下不可用
shell> mmm_control MOVE_ROLE --force role host #可以将active_master_role切换给状态为REPLICATION_FAIL或
REPLICATION_DELAY的主机上,在passive模式下不可用
shell> mmm_control set_ip ip host #set_ip可以用来在被动模式下操纵角色,ACTIVE或者MANUAL模式后生效
参考:
http://mysql-mmm.org/mmm2:guide
http://badb0y.blog.51cto.com/196368/415806
http://linuxguest.blog.51cto.com/195664/578311
http://linuxguest.blog.51cto.com/195664/578307
转载:http://blog.youkuaiyun.com/banpingzijy/article/details/44280963