一、背景
原生 PG 流复制不支持自动升降级,也就是异步升级为同步、同步降级为异步
我们知道,同步模式下的流复制,主提交事务时需要等待 sync 从反馈 WAL 复制位点(也就是先持久化主,然后同步给 sync 从,并等待 sync 从的 WAL 复制位点的 ack),一旦 sync 从挂掉,主的事务提交等待是无限期的,所以主从两节点的同步流复制,无法兼顾可用性(RTO)
如何兼顾可用性?
需要借助 pg_stat_activity 和 pg_stat_replication 这两个视图,详见下
二、实操
1、环境
# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
# show wal_level;
wal_level
-----------
logical
(1 row)
wangqds=# show max_wal_senders;
max_wal_senders
-----------------
8
(1 row)
# show synchronous_standby_names;
synchronous_standby_names
---------------------------
FIRST 1 (walrecv1, walrecv2)
(1 row)
2、降级验证
前提:walrecv1、2 均未启动 + 设定 RTO 阈值为 3min
- 窗口一
# select * from pg_stat_activity;
datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type
backend_type
-------+---------+-------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------
---------------------------
| | 60518 | | | | | | | | 2025-08-14 12:41:45.979258+08 | | | | Activity
tovacuum launcher
| | 60520 | | 10 | mam | | | | | 2025-08-14 12:41:45.979439+08 | | | | Activity
gical replication launcher
16405 | mampgds | 28401 | | 16402 | mampgds | psql | 127.0.0.1 | | 37852 | 2025-09-26 16:05:23.938437+08 | 2025-09-26 16:08:17.062043+08 | 2025-09-26 16:08:17.062043+08 | 2025-09-26 16:08:17.062048+08 |
ient backend
| | 60516 | | | | | | | | 2025-08-14 12:41:45.97907+08 | | | | Activity
ckground writer
| | 60515 | | | | | | | | 2025-08-14 12:41:45.978989+08 | | | | Activity
eckpointer
| | 60517 | | | | | | | | 2025-08-14 12:41:45.979114+08 | | | | Activity
lwriter
(6 rows)
# create table test_0926(col int);
卡住ing
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
CREATE TABLE
# show synchronous_commit;
synchronous_commit
--------------------
local
(1 row)
# create table test_0926_2(col int);
CREATE TABLE
- 窗口二
# select * from pg_stat_activity;
datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type
| wait_event | state | backend_xid | backend_xmin | query_id | query | backend_type
-------+---------+-------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------
+---------------------+--------+-------------+--------------+----------+----------------------------------+------------------------------
| | 60518 | | | | | | | | 2025-08-14 12:41:45.979258+08 | | | | Activity
| AutoVacuumMain | | | | | | autovacuum launcher
| | 60520 | | 10 | mam | | | | | 2025-08-14 12:41:45.979439+08 | | | | Activity
| LogicalLauncherMain | | | | | | logical replication launcher
16405 | mampgds | 28401 | | 16402 | mampgds | psql | 127.0.0.1 | | 37852 | 2025-09-26 16:05:23.938437+08 | 2025-09-26 16:11:06.609226+08 | 2025-09-26 16:11:06.609226+08 | 2025-09-26 16:11:06.609231+08 |
| | active | | 5072634 | | select * from pg_stat_activity; | client backend
16405 | mampgds | 28637 | | 16402 | mampgds | psql | 127.0.0.1 | | 50576 | 2025-09-26 16:09:24.564688+08 | 2025-09-26 16:09:27.957584+08 | 2025-09-26 16:09:27.957584+08 | 2025-09-26 16:09:27.957589+08 | IPC
| SyncRep | active | 5072634 | | | create table test_0926(col int); | client backend
| | 60516 | | | | | | | | 2025-08-14 12:41:45.97907+08 | | | | Activity
| BgWriterHibernate | | | | | | background writer
| | 60515 | | | | | | | | 2025-08-14 12:41:45.978989+08 | | | | Activity
| CheckpointerMain | | | | | | checkpointer
| | 60517 | | | | | | | | 2025-08-14 12:41:45.979114+08 | | | | Activity
| WalWriterMain | | | | | | walwriter
(7 rows)
对 pg_stat_activity 视图进行监测,如果返回结果超过设定的 RTO 阈值,同步降级为异步
注意:此处还应该结合 pg_stat_replication 视图判断 sync 从是否挂掉,然后再考虑是否降级,当然我们省略了这一判断,因为大前提就是 walrecv1、2 均未启动
# select max(now()-query_start) from pg_stat_activity where wait_event='SyncRep';
max
-----------------
00:03:11.451059
(1 row)
修改参数
# alter system set synchronous_commit=local;
ALTER SYSTEM
生效参数,此处不需要重启数据库
# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
清空当前等待队列
# select pg_cancel_backend(28637) from pg_stat_activity where wait_event='SyncRep';
pg_cancel_backend
-------------------
t
(1 row)
窗口一事务提交等待结束
三、如何异步升级为同步?
对 pg_stat_replication 视图进行如下监测,如果有结果返回的话,可继续后续的异步升级同步操作
# select * from pg_stat_replication where sync_state='sync' and state='streaming' and application_name in ('walrecv1', 'walrecv2');
返回结果示例:
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_l
sn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+--------
----+------------+-----------+-----------+------------+---------------+------------+-------------------------------
62347 | 16402 | mampgds | walrecv2 | 192.168.5.102 | | 44394 | 2025-09-28 10:36:40.114775+08 | | streaming | A/C8089ED8 | A/C8089ED8 | A/C8089
ED8 | A/C8089ED8 | | | | 2 | sync | 2025-09-28 10:37:01.432262+08
(1 row)
修改参数
# alter system set synchronous_commit=remote_write;
生效参数,此处不需要重启数据库
# select pg_reload_conf();
698

被折叠的 条评论
为什么被折叠?



