流复制根据数据同步方式分为异步流复制和同步流复制
主机环境配置信息
主机 | 主机名 | IP地址 | Postgresql版本 | 数据目录 |
---|---|---|---|---|
主节点 | postgresql2 | 192.168.0.154 | 11.3 | /db/pgdata |
备节点 | postgresql3 | 192.168.0.155 | 11.3 | /db/pgdata |
一、异步流复制是什么?
异步流复制是指主库上提交事务时不需要等待备库接收WAL日志流并写入到备库WAL日志文件时便返回成功
异步流复制部署主要有两种方式,一是拷贝数据文件方式,二是通过pg_basebackup命令行工具
二、以拷贝数据文件方式部署异步流复制
1.主备机参数配置
1. 配置前,查看数据库集簇状态
[postgres@postgresql3 ~]$ pg_controldata
pg_control version number: 1100
Catalog version number: 201809051
Database system identifier: 6692713076617317907
Database cluster state: in production #生产模式
2. 修改配置文件postgresql.conf,主备配置一致
wal_level = replica # minimal, replica, or logical
archive_mode = on # enables archiving; off, on, or always
# (change requires restart)
archive_command = '/usr/bin/lz4 -q -z %p /db/archive_wals/%f.lz4'
max_wal_senders = 10 # max number of walsender processes
# (change requires restart)
wal_keep_segments = 512 # in logfile segments; 0 disables
hot_standby = on # "off" disallows queries during recovery
-
wal_level
wal_level参数控制WAL日志信息的输出级别,有三种模式:minimal,replica,logical
replica:会记录支持WAL归档、复制和备库中启用只读查询等操作所需的WAL信息
此参数默认值是replica
调整此参数需要重启数据库生效
开启流复制至少需要设置此参数为replica -
archive_mode
控制是否启用归档
on表示启用归档并使用archive_command参数配置命令将WAL日志归档到归档存储上
调整此参数需要重启数据库生效
通常设置为on -
archive_command
设置WAL归档命令
可以将WAL归档到备机目录,也可以归档到远程其他主机 -
max_wal_sender
控制主库上的最大WAL发送进程数,通过pg_basebackup命令在主机上做基准备份时也会消耗WAL进程,此参数不能比max_connections参数值高,默认值为10,一个流复制备库通常只需要消耗流复制主库一个WAL发送进程 -
wal_keep_segmengts
设置主库pg_wal目录保留的最小WAL日志文件数,以便备库落后主库时可以通过主库保留的WAL进行追回,这个参数设置得越大,理论上备库在异常断开时追平主库的几率越大
3. 配置pg_hba.conf文件
主库备库pg_hba.conf配置完全一致(主备库会发生切换)
host replication repuser 192.168.0.154/32 md5
host replication repuser 192.168.0.155/32 md5
4. 主备机重启数据库
2.主库创建流复制用户
postgres=# create user repuser
postgres-# replication
postgres-# login
postgres-# connection limit 5
postgres-# encrypted password 're12a345';
CREATE ROLE
3.主库发起在线备份
1. 主库发起在线备份
postgres=# select pg_start_backup('test_bk1');
pg_start_backup
-----------------
1/86000060
(1 row)
2. 打包pgdata目录
[postgres@postgresql2 db]$ tar czvf pgdata.tar.gz pgdata
[postgres@postgresql2 db]$ ll
total 6.8M
drwxrwxr-x 2 postgres postgres 42 Dec 21 15:26 archive_wals
drwx------ 20 postgres postgres 4.0K Dec 21 15:26 pgdata
-rw-rw-r-- 1 postgres postgres 6.8M Dec 21 15:28 pgdata.tar.gz
drwxrwxr-x 3 postgres postgres 21 Jun 13 2019 pgtbs
3. 将数据文件拷贝到备节点
[postgres@postgresql2 db]$ scp pgdata.tar.gz postgres@192.168.0.155:/db/
4. 如果备机已安装数据库,将备机停止,改数据文件夹名称
[postgres@postgresql3 db]$ mv pgdata pgdata.old
[postgres@postgresql3 db]$ ll
total 6.8M
drwxrwxr-x 2 postgres postgres 6 Dec 21 10:06 archive_wals
drwx------ 20 postgres postgres 4.0K Dec 21 15:18 pgdata.old
-rw-rw-r-- 1 postgres postgres 6.8M Dec 21 15:32 pgdata.tar.gz
drwxrwxr-x 3 postgres postgres 21 Jun 14 2019 pgtbs
5. 备机解压pgdata
[postgres@postgresql3 db]$ tar -zxvf pgdata.tar.gz
6. 主机停止在线备份
postgres=# select pg_stop_backup();
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
pg_stop_backup
----------------
1/86000168
(1 row)
4.备机配置recovery.conf
1. 拷贝模板到$PGDATA下
[postgres@postgresql3 db]$ cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf
2. 修改recovery.conf
[postgres@postgresql3 pgdata]$ vi recovery.conf
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.0.154 port=5432 user=repuser'
-
recovery_target_timeline
设置恢复的时间线
latest恢复到最近的时间线 -
standby_mode
是否启用数据库为备库
on:备库会不停地从主库上获取WAL日志流,直到获取主库上最新的WAL日志流 -
primary_conninfo
设置主库的连接信息
3. 配置.pgpass隐藏文件
[postgres@postgresql3 ~]$ touch .pgpass
[postgres@postgresql3 ~]$ chmod 0600 .pgpass
[postgres@postgresql3 ~]$ vi .pgpass
192.168.0.154:5432:replication:repuser:re12a345
192.168.0.155:5432:replication:repuser:re12a345
4. 启动备库
5.测试同步情况
在主库上创建表t3,并插入数据
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# \d
List of relations
Schema | Name | Type | Owner
--------+----------+-------+----------
public | blog | table | postgres
public | persons | table | postgres
public | students | table | postgres
public | t | table | postgres
public | t1 | table | postgres
public | t2 | table | postgres
(6 rows)
testdb=# create table t3 (id int4);
CREATE TABLE
testdb=# insert into t3 values (1);
INSERT 0 1
testdb=# select * from t3;
id
----
1
(1 row)
在备库上查询
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# \d
List of relations
Schema | Name | Type | Owner
--------+----------+-------+----------
public | blog | table | postgres
public | persons | table | postgres
public | students | table | postgres
public | t | table | postgres
public | t1 | table | postgres
public | t2 | table | postgres
public | t3 | table | postgres
(7 rows)
testdb=# select * from t3;
id
----
1
(1 row)
已经同步成功
问题处理
当启动备库时,无法启动
[postgres@postgresql3 pgdata.old]$ pg_ctl start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-12-21 16:57:10.659 CST [18587] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-12-21 16:57:10.659 CST [18587] LOG: listening on IPv6 address "::", port 5432
2020-12-21 16:57:10.661 CST [18587] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-12-21 16:57:10.689 CST [18587] LOG: redirecting log output to logging collector process
2020-12-21 16:57:10.689 CST [18587] HINT: Future log output will appear in directory "/var/log/pg_log".
- 查看启动日志
[postgres@postgresql3 pg_log]$ tail -f postgresql-2020-12-21_165710.log
TCP/IP connections on port 5432?
2020-12-21 16:57:15.882 CST [18595] FATAL: could not connect to the primary server: could not connect to server: No route to host
Is the server running on host "192.168.0.154" and accepting
TCP/IP connections on port 5432?
2020-12-21 16:57:20.884 CST [18598] FATAL: could not connect to the primary server: could not connect to server: No route to host
Is the server running on host "192.168.0.154" and accepting
TCP/IP connections on port 5432?
2020-12-21 16:57:25.887 CST [18599] FATAL: could not connect to the primary server: could not connect to server: No route to host
Is the server running on host "192.168.0.154" and accepting
TCP/IP connections on port 5432?
2020-12-21 16:57:30.896 CST [18601] FATAL: could not connect to the primary server: could not connect to server: No route to host
Is the server running on host "192.168.0.154" and accepting
TCP/IP connections on port 5432?
2020-12-21 16:57:35.902 CST [18610] FATAL: could not connect to the primary server: could not connect to server: No route to host
Is the server running on host "192.168.0.154" and accepting
TCP/IP connections on port 5432?
2020-12-21 16:57:38.929 CST [18587] LOG: received fast shutdown request
2020-12-21 16:57:43.936 CST [18590] LOG: shutting down
2020-12-21 16:57:43.941 CST [18587] LOG: database system is shut down
2. 测试主备机连通情况
主备机可以ping通,telnet不通
[root@postgresql3 ~]# ping 192.168.0.154
PING 192.168.0.154 (192.168.0.154) 56(84) bytes of data.
64 bytes from 192.168.0.154: icmp_seq=1 ttl=64 time=0.301 ms
64 bytes from 192.168.0.154: icmp_seq=2 ttl=64 time=0.771 ms
^C
--- 192.168.0.154 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.301/0.536/0.771/0.235 ms
[root@postgresql3 ~]# telnet 192.168.0.154
Trying 192.168.0.154...
telnet: connect to address 192.168.0.154: No route to host
- 主备机关闭防火墙
[root@postgresql3 ~]# systemctl stop firewalld.service
[root@postgresql3 ~]# telnet 92.168.0.154
Trying 92.168.0.154...
telnet: connect to address 92.168.0.154: No route to host
[root@postgresql3 ~]# systemctl status firewalld
- 启库成功
[postgres@postgresql3 ~]$ pg_ctl start
waiting for server to start....2020-12-21 17:01:37.907 CST [18811] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-12-21 17:01:37.907 CST [18811] LOG: listening on IPv6 address "::", port 5432
2020-12-21 17:01:37.909 CST [18811] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-12-21 17:01:37.919 CST [18811] LOG: redirecting log output to logging collector process
2020-12-21 17:01:37.919 CST [18811] HINT: Future log output will appear in directory "/var/log/pg_log".
done
server started
三、以pg_basebackup方式部署异步流复制
1. 备节点备份recovery.conf文件
[postgres@postgresql3 pgdata]$ cp recovery.conf /home/postgres/
2. 之前已经配置过备库,需要将备节点关闭并将数据文件删除
[postgres@postgresql3 pgdata]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgresql3 pgdata]$ rm -rf /db/pgdata
3. 通过Postgresql提供的内置命令行工具pg_basebackup对主库发起一个在线基准备份
此工具是对数据库实例级别进行物理备份,将1)pg_start_backup 2)拷贝主节点数据文件和表空间文件到备节点 3)pg_stop_backup合成一步完成
[postgres@postgresql3 ~]$ pg_basebackup -D /db/pgdata -Fp -Xs -v -P -h 192.168.0.154 -p 5432 -U repuser
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/8A000028 on timeline 5
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_11524"
66069/66069 kB (100%), 3/3 tablespaces
pg_basebackup: write-ahead log end point: 1/8A0000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
-
-D表示指定备节点用来接收主库数据的目标路径
-
-F指定生成的备份数据格式,p(plain):生成的备份数据和主库上的数据文件布局一样(类似操作系统命令将数据文件和表空间文件完全拷贝到备节点)
-
-X设置在备份的过程中产生的WAL日志包含在备份中的方式,s(stream)方式除了启动一个基准备份WAL发送进程外还会额外启动一个WAL发送进程用于发送主库产生的WAL增量日志流
-
-v启用verbose模式,命令执行过程中打印出各阶段的日志
-
-P显示数据文件、表空间文件传输百分比
4. 将recovery.conf拷贝回$PGDATA
[postgres@postgresql3 ~]$ cp recovery.conf /db/pgdata
5. 启动备库
[postgres@postgresql3 ~]$ pg_ctl start
waiting for server to start....2020-12-22 15:27:49.960 CST [20415] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-12-22 15:27:49.960 CST [20415] LOG: listening on IPv6 address "::", port 5432
2020-12-22 15:27:49.965 CST [20415] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-12-22 15:27:50.004 CST [20415] LOG: redirecting log output to logging collector process
2020-12-22 15:27:50.004 CST [20415] HINT: Future log output will appear in directory "/var/log/pg_log".
done
server started
6. 查看流复制同步方式
pg_stat_replication显示主库上WAL发送进程信息,主库上有多少个WAL发送进程,此视图就对应多少条记录
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+-----------------------------
pid | 11626
usesysid | 33274
usename | repuser
application_name | walreceiver
client_addr | 192.168.0.155
client_hostname |
client_port | 48694
backend_start | 2020-12-22 15:27:50.14125+08
backend_xmin |
state | streaming
sent_lsn | 1/8B002E90
write_lsn | 1/8B002E90
flush_lsn | 1/8B002E90
replay_lsn | 1/8B002E90
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
async:表示备库为异步同步方式
四、同步流复制是什么?
在主库上提交事务时需等待备库接收并写入WAL日志,当主库至少收到一个备库发回的确认信息时便返回成功,确保了至少一个备库收到了主库发送的WAL日志,一方面保障了数据的完整性,一方面增加了事务的响应时间。
五、配置同步流复制
- 配置备库recovery.conf文件
[postgres@postgresql3 ~]$ cd $PGDATA
[postgres@postgresql3 pgdata]$ vi recovery.conf
primary_conninfo = 'host=192.168.0.154 port=5432 user=repuser application_name=node3'
添加了application_name选项,指定备节点别名为node3
- 修改主库postgresql.conf文件
[postgres@postgresql2 ~]$ cd $PGDATA
[postgres@postgresql2 pgdata]$ vi postgresql.conf
synchronous_commit = on # synchronization level;
synchronous_standby_names = 'node3'
synchronous_commit:设置为on表示主库提交事务时,需等待备库接收主库发送的WAL日志流并写入WAL文件,之后才向客户端返回成功(本地WAL已落盘,备库WAL已落盘,两份持久化WAL,但备库没有重做)
synchronous_standby_names:配置同步复制的备库列表,可配置多个备库,必须和同步备库recovery.conf文件的primary_conninfo参数的application_name值一致。
- 主库无需重启,重新加载即可
[postgres@postgresql2 pgdata]$ pg_ctl reload
server signaled
- 备库修改了recovery.conf需重启数据库
[postgres@postgresql3 pgdata]$ pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-12-22 17:03:37.134 CST [21860] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-12-22 17:03:37.134 CST [21860] LOG: listening on IPv6 address "::", port 5432
2020-12-22 17:03:37.139 CST [21860] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-12-22 17:03:37.157 CST [21860] LOG: redirecting log output to logging collector process
2020-12-22 17:03:37.157 CST [21860] HINT: Future log output will appear in directory "/var/log/pg_log".
done
server started
- 主库上查看复制状态
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 12817
usesysid | 33274
usename | repuser
application_name | node3
client_addr | 192.168.0.155
client_hostname |
client_port | 39582
backend_start | 2020-12-22 17:03:37.219941+08
backend_xmin |
state | streaming
sent_lsn | 1/8B002F70
write_lsn | 1/8B002F70
flush_lsn | 1/8B002F70
replay_lsn | 1/8B002F70
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | sync
sync:同步复制方式
参考书籍:《PostgreSQL 实战》