基于双vip的gtid半同步主从复制的MySQL高可用集群
一、项目准备
1、构建网络拓扑图
2、服务器准备
机器名称 | IP地址 | 用途 |
---|---|---|
mysql-master | 192.168.145.10 | 主服务器 |
slave | 192.168.145.11 | 第一台从服务器 |
delay-backup | 192.168.145.138 | 延迟备份服务器(也可充当异地备份服务器) |
slave3 | 192.168.145.136 | 第二台从服务器 |
ansible | 192.168.145.137 | 自动化运维,管理集群服务器 |
MySQL-router1 | 192.168.145.140 | 用于读写 |
MySQL-router2 | 192.168.145.141 | 用于只读 |
test | 192.168.145.143 | 压力测试 |
其中master和前面slave、delay-backup、slave3是MySQL服务器,用于搭建主从复制集群,数据从master或者slave导出,通过rsync工具备份到其他服务器。
MySQL-router服务器上安装keepalived软件,用于实现高可用(HA)读写分离。
ansible中控服务器,用于对整个MySQL集群进行批量操控。
test服务器用于对集群进行压力测试。
3、项目环境
项目环境:8台服务器(2G,2核),centos7.9 mysql5.7.38 mysqlrouter8.0.21 keepalived2.0.10 ansible
二、项目步骤
1、安装好4台MySQL服务器系统和MySQL软件,配置半同步和gtid功能
(1)安装半同步插件(master和每一台slave服务器都需要安装)
mysql>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
(2)修改master配置文件进行永久配置:
[root@master ~]# vim /etc/my.cnf
...
[mysqld]
..
#log_bin二进制日志
log_bin
server_id = 1
#半同步配置
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=1000 # 1 second
#GTID
gtid-mode=ON
enforce-gtid-consistency=ON
....
..
(3)修改slave配置文件进行永久配置:
[mysqld]
#log bin 二进制日志
log_bin
server_id = 4
expire_logs_days = 15
#开启gtid功能
gtid-mode=ON
enforce-gtid-consistency=ON
log_slave_updates=ON
#开启半同步,需要提前安装半同步的插件
rpl_semi_sync_slave_enabled=1
(4)重启mysqld服务
service mysqld restart
(5)重置slave的mysql环境
mysql>reset master
mysql>stop slave
mysql>reset slave all
(6)在master上面新建一个授权用户,给slave来复制二进制日志
mysql>grant replication slave on *.* to 'zhangwz'@'192.168.145.%' identified by 'mysql123456';
(7)导出master上的基础数据
[root@master ~]# mysqldump -uroot -p'mysql123456' --all-databases >all_db.SQL
mysqldump: [Warning] Using a password on the command line interface can be insecure.
[root@master ~]#
2、安装并配置ansible服务器,并建立免密通道,实现批量操作
(1)安装epel源以及ansible
[root@ansible ~]# yum install epel-release -y
[root@ansible ~]# yum install ansible -y
(2)配置hosts文件
[root@ansible ~]# vim /etc/ansible/hosts
....
#db是mysql集群服务器里面所有的服务器ip
[db]
192.168.145.10
192.168.145.11
192.168.145.136
192.168.145.138
#所有的从服务器ip
[dbslaves]
192.168.145.11
192.168.145.138
192.168.145.136
(3)建立免密通道(一路回车即可)
[root@ansible ~]# ssh-keygen -t rsa
(4)给每一台服务器节点都连接好ansible
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.145.10
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.145.11
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.145.136
[root@ansible ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.145.138
(5)测试免密通道是否建立成功
[root@ansible ~]# ssh 'root@192.168.145.10'
Last login: Mon May 15 23:18:45 2023 from 192.168.145.1
##此时的主机名已经变成master,说明建立成功
[root@master ~]#
(6)使用ansible在所有的slave上导入基础数据,部署mysql集群
[root@ansible ~]# scp root@192.168.145.10:/root/all_db.SQL .
all_db.SQL 100% 888KB 29.2MB/s 00:00
[root@ansible ~]# ls
all_db.SQL anaconda-ks.cfg
[root@ansible ~]# ansible -m copy -a "src=/root/all_db.SQL dest=/root" dbslaves
三台slave服务器(slave、slave3、delay-back)均需完成数据导入:
[root@slave ~]# mysql -uroot -p'mysql123456' <all_db.SQL
mysql: [Warning] Using a password on the command line interface can be insecure.
注意:如果出现slave导入错误的话,提前关闭master和slave的gtid功能(注释gtid代码),重新启动mysqld进程,再重新在master上导出数据包,mysql,再重新利用ansible传送到slave上。记得保证每台从服务器都是安装好插件的,不然从服务器无法正常启动mysql
3、开启半同步服务和gtid功能
(1)清空master数据
mysql>reset master;
(2)开启gtid功能,开启主从复制功能
将配置文件my.cnf里面的gtid配置启用,登录slave服务器数据库,输入以下授权命令(授权master服务器),然后start slave(delay-backup暂时不用配置)
mysql> CHANGE MASTER TO MASTER_HOST='192.168.145.10' ,
MASTER_USER='zhangwz',
MASTER_PASSWORD='mysql123456',
MASTER_PORT=3306,
master_auto_position=1;
mysql> start slave;
(3)查看启动结果
mysql>show slave status\G;
注意:记得一定要清空slave环境,slave上master和本机的环境都必须是干净的,不然服务无法成功启动,会出现以下错误:
因为slave上面的信息可能比master 还要新,Gtid编号比master更大,导致io线程无法正常启动,导致出错。清空所有的信息,再重新授权即可正常启动
mysql> reset master;
mysql> stop slave ;
mysql> reset slave all;
mysql> CHANGE MASTER TO MASTER_HOST='192.168.145.10' ,
MASTER_USER='zhangwz',
MASTER_PASSWORD='mysql123456',
MASTER_PORT=3306,
master_auto_position=1;
mysql>show slave status\G;
(4)在master和slave上查看是否激活了半同步功能
mysql>select plugin_name,plugin_status from information_schema.plugins where plugin_name like '%semi%';
+----------------------+---------------+
| plugin_name | plugin_status |
+----------------------+---------------+
| rpl_semi_sync_master | ACTIVE |
| rpl_semi_sync_slave | ACTIVE |
+----------------------+---------------+
1 row in set (0.01 sec)
4、验证数据库集群是否成功创建
在master主服务器上面进行增删改查操作,如果从服务器上出现一样的结果,证明半同步复制成功启用;查看gtid编号,如果都一样,证明gtid服务也成功开启。
5、开启delay-backup延迟备份
根据拓扑图,delay-backup服务器的备份来源与slave1(slave)服务器,所以在配置的时候只需将ip地址设置为slave的ip,同时加上延迟备份时间配置命令,其他方法步骤沿上即可。
mysql>stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql>reset slave all;
Query OK, 0 rows affected (0.00 sec)
mysql>reset master;
Query OK, 0 rows affected (0.00 sec)
mysql>CHANGE MASTER TO MASTER_HOST='192.168.145.11' ,
MASTER_USER='zhangwz',
MASTER_PASSWORD='mysql123456',
MASTER_PORT=3306,
master_auto_position=1;
mysql>CHANGE MASTER TO MASTER_DELAY = 10; ##延迟10s
mysql>start slave;
Query OK, 0 rows affected (0.00 sec)
6、设置master定时自动备份
(1)在master创建和ansible的免密通道(前面已经设置了ansible的,现在是建立双向免密通道,方便同步数据)
[root@master ~]# ssh-keygen -t rsa
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.145.137
(2)编写计划任务脚本(脚本路径:/backup/backup_alldb.sh)
#!/bin/bash
mkdir -p /backup
mysqldump -uroot -p'mysql123456' --all-databases --triggers --routines --events >/backup/$(date +%Y%m%d%H%M%S)_all_db.SQL
scp /backup/$(date +%Y%m%d%H%M%S)_all_db.SQL 192.168.145.137:/backup
(3)创建计划任务
[root@master backup]# crontab -e
no crontab for root - using an empty one
crontab: installing new crontab
[root@master backup]# crontab -l
30 2 * * * bash /backup/backup_alldb.sh
7、读写分离部署
读写分离是将数据库的增删改查等操作进行分离,读操作往slave上进行,读写操作都可以在master上进行,本质上起到了负载均衡的作用。
目的:解决大并发的场景下,提升整个mysql集群的处理能力,避免资源的闲置,提高数据库党的响应能力,提高用户使用的满意度。让整个数据库的访问过程非常顺畅,不卡顿。
好处:DBA负责数据库的业务,开发人员不需要了解,直接访问读写分离的服务器就可以了。
(1)在官网上下载安装mysqlrouter软件
https://dev.mysql.com/get/Downloads/MySQL-Router/mysql-router-community-8.0.21-1.el7.x86_64.rpm
(2)使用xftp将安装包导入linux系统并安装(提前准备好两台虚拟机router1和router2,两台都需要安装mysqlrouter)
rpm -ivh mysql-router-community-8.0.21-1.el7.x86_64.rpm
(3)修改配置文件
mysqlrouter1
[DEFAULT]
logging_folder = /var/log/mysqlrouter
runtime_folder = /var/run/mysqlrouter
config_folder = /etc/mysqlrouter
[logger]
level = INFO
# If no plugin is configured which starts a service, keepalive
# will make sure MySQL Router will not immediately exit. It is
# safe to remove once Router is configured.
[keepalive]
interval = 60
#read
[routing:slaves]
#绑定本机的ip和端口
#bind_address = 192.168.145.140:7001
#在keepalived的时候使用0.0.0.0
bind_address = 0.0.0.0:7001
#slave
destinations = 192.168.145.11:3306,192.168.145.136:3306
mode = read-only
connect_timeout = 1
#write
[routing:masters]
#bind_address = 192.168.145.140:7002
bind_address = 0.0.0.0:7002
#master
destinations = 192.168.145.10:3306
mode = read-write
connect_timeout = 1
mysqlrouter2
[DEFAULT]
logging_folder = /var/log/mysqlrouter
runtime_folder = /var/run/mysqlrouter
config_folder = /etc/mysqlrouter
[logger]
level = INFO
# If no plugin is configured which starts a service, keepalive
# will make sure MySQL Router will not immediately exit. It is
# safe to remove once Router is configured.
[keepalive]
interval = 60
#read
[routing:slaves]
#绑定本机的ip和端口
bind_address = 0.0.0.0:7001
#slave
destinations = 192.168.145.11:3306,192.168.145.136:3306
mode = read-only
connect_timeout = 1
#write
[routing:masters]
bind_address = 0.0.0.0:7002
#master
destinations = 192.168.145.10:3306
mode = read-write
connect_timeout = 1
注意:mysql-router的bind_address由于要vip的访问,所以可以都设置为0.0.0.0
(4)启动MySQL router服务,监听了7001和7002端口
[root@mysql-router-1 mysqlrouter]# service mysqlrouter start
[root@mysql-router-1 mysqlrouter]# netstat -anplut|grep mysql
tcp 0 0 192.168.145.140:7001 0.0.0.0:* LISTEN 2003/mysqlrouter
tcp 0 0 192.168.145.140:7002 0.0.0.0:* LISTEN 2003/mysqlrouter
(5)在master上创建两个测试账号,一个是读,一个是写
mysql>grant all on *.* to 'sc-write'@'%' identified by 'mysql123456';
mysql>grant select on *.* to 'sc-read'@'%' identified by 'mysql123456';
(6)在客户端测试读写分离的效果(用在集群外的机器测,此处使用的是MysqlYog)
sc-write可以进行读写
sc-read只能读
关闭mysql-router后,客户端会直接连接不上
(7)读写分离的关键点:
关键在于用户的权限,让不同的用户连接不同的端口,最后仍然要到后端的mysql服务器里面去验证是否有读写的权限;
mysqlrouter只是做了读写的分流,让应用程序去连接不同的端口 ;是一个分流工具
主要还是用户权限的控制,有写权限的用户走读的通道也可以写,读的用户走写的通道只能读
8、搭建keepalived,实现双vip功能
首先介绍一下keepalived的2大核心功能:
- 负载均衡(loadbalance):ipvs——>lvs软件在linux内核里面已经安装,不需要单独安装
- 高可用(high availability):是指不会有单点故障,一台服务器坏了还有另外的能顶替,不影响工作,有备份。
- vrrp协议的工作原理:
vrrp协议(虚拟路由冗余协议),一组路由器协同工作,担任不同的角色,有master角色,也有backup角色;master角色的路由器(的接口)承担实际的数据流量转发任务;backup路由器侦听Master路由器的状态,并在Master路由器发生故障时,接替其工作,从而保证业务流量的平滑切换。
1.所有的路由器或者服务器发送vrrp宣告报文,进行选举,必须是相同vrid和认证密码的,优先级高的服务器或者路由器会被选举为master,其他的机器都是backup
2.master定时(Advertisement Interval)发送VRRP通告报文,以便向Backup路由器告 知自己的存活情况。 默认是间隔1秒
3.接收Master设备发送的VRRP通告报文,判断Master设备的状态是否正常。 如果超过1秒没有收到vrrp报文,就认为master挂了,开始重新选举新的master,vip会漂移到新的master上
vip是在一个VRRP 组内的多个路由器接口共用一个虚拟IP地址,该地址被作为局域网内所有主机的缺省网关地址
(1)在mysql-router1和router2上安装keepalived软件
[root@mysql-router-1 mysqlrouter]# yum install keepalived -y
(2)修改配置文件
[root@mysql-router-1 mysqlrouter]# cd /etc/keepalived/
[root@mysql-router-1 keepalived]# ls
keepalived.conf
[root@mysql-router-1 keepalived]# vim keepalived.conf
#####router1
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
#当vrrp协议出现主从切换的时候,执行的脚本,配置名字叫check_run
vrrp_script check_run {
script "/mail/sendmail.sh"
interval 3
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 80
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
send_mail
}
virtual_ipaddress {
192.168.145.188
}
}
#####router2
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script send_mail {
script "/mail/sendmail.sh"
interval 3
}
nstance VI_1 {
state BACKUP
interface ens33
virtual_router_id 80
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script{
send_mail
}
virtual_ipaddress {
192.168.145.188
}
}
(3)刷新keepalived服务并关闭防火墙
[root@mysql-router-1 keepalived]# systemctl disable firewalld
[root@mysql-router-1 keepalived]# service keepalived start
(4)使用ip add
命令检查vip是否正常出现,即master机器上会有vip,backup机器正常情况下不会出现,只在master出现问题后vip才会漂移到backup上
如果两台机器上都出现了vip,表明发生了脑裂现象
:
出现的原因可能是网络不能正常通信,vrrp通告报文不能正常发送,两边互相以为挂机了,所以两边都会出现vip。
脑裂现象没有特别大的危害,反而起到了负载均衡的作用,但是vip选择的随机性比较大,不可控。
(5)验证vip漂移现象
关闭master的keepalived服务
[root@mysql-router-1 keepalived]# service keepalived stop
使用ip add
查看backup服务器上的vip
(6)实现双vip功能
在配置文件里写入两个vrrp实例,两个vip,让两个实例互为主备即可
###router1
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 80
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
send_mail
}
virtual_ipaddress {
192.168.145.188
}
}
vrrp_instance VI_2 {
state backup
interface ens33
virtual_router_id 81
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
send_mail
}
virtual_ipaddress {
192.168.145.189
}
}
###router2
vrrp_instance VI_1 {
state backup
interface ens33
virtual_router_id 80
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
send_mail
}
notify_backup /mail/backup.sh
notify_master /mail/master.sh
notify_stop /mail/stop.sh
virtual_ipaddress {
192.168.145.188
}
}
vrrp_instance VI_2 {
state master
interface ens33
virtual_router_id 81
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
send_mail
}
notify_backup /mail/backup.sh
notify_master /mail/master.sh
notify_stop /mail/stop.sh
virtual_ipaddress {
192.168.145.189
}
}
9、压力测试
测试工具:sysbench
(1)新开一台用于测试的虚拟机,下载MySQL repo文件,解决yum源依赖问题
https://dev.mysql.com/downloads/file/?id=510575
[root@client ~]# rpm -ivh mysql80-community-release-el7-6.noarch.rpm
[root@client ~]# yum install epel-release -y
[root@client ~]# yum install sysbench
(2)使用vip以及读写端口进入slave机器mysql,并提前创建好sysbench默认的sbtest数据库
[root@ln-slave ~]# mysql -uwrite -p'mysql123456' -h 192.168.145.188 -P 7002
–tables=10表示创建10个测试表,–table_size=10000表示每个表中插入1W行数据,prepare表示这是准备数的过程。
sysbench --mysql-host=192.168.145.188 --mysql-port=7002 --mysql-user=sc-write --mysql-password='mysql123456' /usr/share/sysbench/oltp_common.lua --tables=10 --table_size=10000 prepare
测试结果:
[ 5s ] thds: 4 tps: 158.12 qps: 3169.72 (r/w/o: 2220.02/632.66/317.03) lat (ms,95%): 31.94 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 149.04 qps: 2986.42 (r/w/o: 2089.98/598.36/298.08) lat (ms,95%): 34.33 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 4 tps: 135.78 qps: 2714.06 (r/w/o: 1900.56/541.93/271.57) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 4 tps: 152.03 qps: 3037.53 (r/w/o: 2126.57/606.90/304.05) lat (ms,95%): 34.95 err/s: 0.00 reconn/s: 0.00