PostgreSQL高可用之repmgr

本文介绍使用repmgr实现PostgreSQL高可用性的详细步骤,包括数据库初始化、配置管理、主备切换及故障转移等内容。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

作者:XuYuchong

软件版本:

PostgreSQL 13.4

repmgr 5.3

数据库安装规划:

node_id

node_name

port

data_directory

1

node1

5432

/data/n001

primary

2

node2

5433

/data/n002

standby

3

node3

5434

/data/n003

standby

01,初始化数据库,并修改数据库参数,启动数据库

#初始化

initdb -D /data/n001

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /data/n001 ... ok

creating subdirectories ... ok

selecting dynamic shared memory implementation ... posix

selecting default max_connections ... 100

selecting default shared_buffers ... 128MB

selecting default time zone ... Asia/Shanghai

creating configuration files ... ok

running bootstrap script ... ok

performing post-bootstrap initialization ... ok

syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections

You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /data/n001 -l logfile start

#参数修改

wal_keep_size=10GB

max_wal_size = 10GB

min_wal_size = 8000MB

max_wal_senders = 10

max_replication_slots = 10

hot_standby = on

archive_mode = on

archive_command = '/bin/true'

shared_preload_libraries ='repmgr'

wal_log_hints=on

show wal_keep_size;

show max_wal_size;

show min_wal_size;

show max_wal_senders;

show max_replication_slots;

show hot_standby;

show archive_mode;

show archive_command;

show shared_preload_libraries;

show wal_log_hints;

#启动数据库

-bash-4.2$ pg_ctl -D /data/n001/ start

waiting for server to start....2021-10-28 10:58:44.526 CST [31597] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 10:58:44.526 CST [31597] LOG:  listening on IPv6 address "::1", port 5432

2021-10-28 10:58:44.526 CST [31597] LOG:  listening on IPv4 address "127.0.0.1", port 5432

2021-10-28 10:58:44.547 CST [31597] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"

2021-10-28 10:58:44.571 CST [31604] LOG:  database system was shut down at 2021-10-28 10:52:48 CST

2021-10-28 10:58:44.602 CST [31597] LOG:  database system is ready to accept connections

done

server started

02,初始化repmgr数据库和用户

#Create a dedicated PostgreSQL superuser account and a database for the repmgr metadat.

createuser -s repmgr

createdb repmgr -O repmgr

03,编辑repmgr配置文件

::::::::::::::

node1.conf

::::::::::::::

node_id=1

node_name='node1'

conninfo='host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n001'

pg_bindir='/usr/local/postgresql/bin'

::::::::::::::

node2.conf

::::::::::::::

node_id=2

node_name='node2'

conninfo='host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n002'

pg_bindir='/usr/local/postgresql/bin'

::::::::::::::

node3.conf

::::::::::::::

node_id=3

node_name='node3'

conninfo='host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n003'

pg_bindir='/usr/local/postgresql/bin'

04,注册node01主数据库

-bash-4.2$ repmgr -f /data/node1.conf  primary register

INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

NOTICE: primary node record (ID: 1) registered

-bash-4.2$ repmgr -f /data/node1.conf  cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

05,克隆node02,node03备数据库

克隆命令:

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone --dry-run

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone

克隆日志:

-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run

NOTICE: destination directory "/data/n002" provided

INFO: connecting to source node

DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr

DETAIL: current installation size is 31 MB

INFO: "repmgr" extension is installed in database "repmgr"

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 10 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: would execute:

  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream

INFO: all prerequisites for "standby clone" are met

-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone

NOTICE: destination directory "/data/n002" provided

INFO: connecting to source node

DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr

DETAIL: current installation size is 31 MB

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: creating directory "/data/n002"...

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /data/n002 start

HINT: after starting the server, you need to register this standby with "repmgr standby register"

修改node02,node03端口

sed -i "s/#port = 5432/port = 5433/g" /data/n002/postgresql.conf

sed -i "s/#port = 5432/port = 5434/g" /data/n003/postgresql.conf

-bash-4.2$ cat /data/n002/postgresql.conf |grep port

port = 5433                             # (change requires restart)

#ssl_passphrase_command_supports_reload = off

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        #   %r = remote host and port

-bash-4.2$ cat /data/n003/postgresql.conf |grep port

port = 5434                             # (change requires restart)

#ssl_passphrase_command_supports_reload = off

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        #   %r = remote host and port

启动数据库并注册:

pg_ctl -D /data/n002 start

pg_ctl -D /data/n003 start

repmgr -f /data/node2.conf standby register

repmgr -f /data/node3.conf standby register

-bash-4.2$ pg_ctl -D /data/n002 start

waiting for server to start....2021-10-28 11:13:27.211 CST [5683] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 11:13:27.212 CST [5683] LOG:  listening on IPv6 address "::1", port 5433

2021-10-28 11:13:27.212 CST [5683] LOG:  listening on IPv4 address "127.0.0.1", port 5433

2021-10-28 11:13:27.238 CST [5683] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"

2021-10-28 11:13:27.271 CST [5684] LOG:  database system was interrupted; last known up at 2021-10-28 11:09:14 CST

..2021-10-28 11:13:29.396 CST [5684] LOG:  entering standby mode

2021-10-28 11:13:29.418 CST [5684] LOG:  redo starts at 0/2000028

2021-10-28 11:13:29.440 CST [5684] LOG:  consistent recovery state reached at 0/2000138

2021-10-28 11:13:29.440 CST [5683] LOG:  database system is ready to accept read only connections

2021-10-28 11:13:29.482 CST [5718] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

done

server started

-bash-4.2$ pg_ctl -D /data/n003 start

waiting for server to start....2021-10-28 11:13:33.327 CST [5734] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 11:13:33.328 CST [5734] LOG:  listening on IPv6 address "::1", port 5434

2021-10-28 11:13:33.328 CST [5734] LOG:  listening on IPv4 address "127.0.0.1", port 5434

2021-10-28 11:13:33.350 CST [5734] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5434"

2021-10-28 11:13:33.397 CST [5736] LOG:  database system was interrupted; last known up at 2021-10-28 11:11:11 CST

.2021-10-28 11:13:35.226 CST [5736] LOG:  entering standby mode

.2021-10-28 11:13:35.258 CST [5736] LOG:  redo starts at 0/4000028

2021-10-28 11:13:35.271 CST [5736] LOG:  consistent recovery state reached at 0/4000100

2021-10-28 11:13:35.271 CST [5734] LOG:  database system is ready to accept read only connections

2021-10-28 11:13:35.285 CST [5764] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1

done

server started

-bash-4.2$ repmgr -f /data/node2.conf standby register

INFO: connecting to local node "node2" (ID: 2)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "node2" (ID: 2) successfully registered

-bash-4.2$ repmgr -f /data/node3.conf standby register

INFO: connecting to local node "node3" (ID: 3)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "node3" (ID: 3) successfully registered

-bash-4.2$ repmgr -f /data/node3.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

06,switchover with repmgr

#切换执行需要配置ssh等效性。(SSH)

#在standby上执行,切换node2为primary

#check:

repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node2.conf

#备机指向新的primary:

repmgr  -f /data/node3.conf standby follow

#切换主数据库为node2:

-bash-4.2$ repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode

INFO: SSH connection to host "127.0.0.1" succeeded

INFO: able to execute "repmgr" on remote host "127.0.0.1"

INFO: all sibling nodes are reachable via SSH

INFO: 2 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby

INFO: following shutdown command would be run on node "node1":

  "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

-bash-4.2$ repmgr standby switchover -f /data/node2.conf

NOTICE: executing switchover on node "node2" (ID: 2)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node1" (ID: 1)

NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/6000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node2" is now primary and node "node1" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr standby switchover -f /data/node2.conf

NOTICE: executing switchover on node "node2" (ID: 2)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node1" (ID: 1)

NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/6000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node2" is now primary and node "node1" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

#可以发现node2已经变为primary

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#切换node3从node2同步数据:执行新的primary

-bash-4.2$ cat n003/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'host=127.0.0.1 user=repmgr application_name=node3 connect_timeout=2'

-bash-4.2$ repmgr  -f /data/node3.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/6000A28, follow target lsn is 0/6000A28

NOTICE: setting node 3's upstream to node 2

WARNING: node "node3" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node2" (ID: 2)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#改名了修改了primary_conninfo信息,指向新的primary数据库。

-bash-4.2$ cat n003/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'user=repmgr connect_timeout=2 host=127.0.0.1 port=5433 application_name=node3'

#切换主数据库为node3:

#check:

repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node3.conf

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node3" (ID: 3) in --dry-run mode

INFO: SSH connection to host "127.0.0.1" succeeded

INFO: able to execute "repmgr" on remote host "127.0.0.1"

INFO: all sibling nodes are reachable via SSH

INFO: 2 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node3" (ID: 3) would be promoted to primary; current primary "node2" (ID: 2) would be demoted to standby

INFO: following shutdown command would be run on node "node2":

  "/usr/local/postgresql/bin/pg_ctl  -D '/data/n002' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

-bash-4.2$ repmgr standby switchover -f /data/node3.conf

NOTICE: executing switchover on node "node3" (ID: 3)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node1 (node ID: 1)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node3" (ID: 3) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby

NOTICE: stopping current primary node "node2" (ID: 2)

NOTICE: issuing CHECKPOINT on node "node2" (ID: 2)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n002' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/7000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node3" (ID: 3) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node3" (ID: 3) was successfully promoted to primary

NOTICE: node "node3" (ID: 3) promoted to primary, node "node2" (ID: 2) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node3" is now primary and node "node2" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node3    | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#切换主数据库为node1:

#candidate的数据库不能直接切换为主数据库:

repmgr  -f /data/node1.conf standby follow

#check:

repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node1.conf

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node3    | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#node1为级联备库,不能直接提升为主数据库。

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode

ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)

DETAIL: registered upstream node ID is 2

HINT: execute "repmgr standby register --force" to update the local node's metadata

#修改node1为node3的从库才可以正常切换为主数据库。

-bash-4.2$ repmgr  -f /data/node1.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30

NOTICE: setting node 1's upstream to node 3

WARNING: node "node1" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node3" (ID: 3)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node3    | default  | 100      | 3        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node3    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode

ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)

DETAIL: registered upstream node ID is 2

HINT: execute "repmgr standby register --force" to update the local node's metadata

-bash-4.2$ repmgr  -f /data/node1.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30

NOTICE: setting node 1's upstream to node 3

WARNING: node "node1" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node3" (ID: 3)

#--siblings-follow更从库指向新的主数据库

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow

NOTICE: executing switchover on node "node1" (ID: 1)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby

NOTICE: stopping current primary node "node3" (ID: 3)

NOTICE: issuing CHECKPOINT on node "node3" (ID: 3)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n003' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/8000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node1" (ID: 1) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node1" (ID: 1) was successfully promoted to primary

NOTICE: node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby

NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

NOTICE: switchover was successful

DETAIL: node "node1" is now primary and node "node3" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

07.failover(Promoting  a standby server with repmgr)

#检查数据库状态

repmgr  -f /data/node1.conf cluster show

-bash-4.2$ repmgr  -f /data/node1.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#停止主数据库

-bash-4.2$ pg_ctl -D /data/n001/ stop

waiting for server to shut down..... done

server stopped

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                           

----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | ? unreachable | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2  

2  | node2 | standby |   running     | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running     | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

  - node "node1" (ID: 1) is registered as an active primary but is unreachable

  - unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)

  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages+

# node2 提升为主数据库

-bash-4.2$ repmgr  -f /data/node2.conf standby promote

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

#node3 更改主库为node2

-bash-4.2$ repmgr  -f /data/node3.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: local node 3 can attach to follow target node 2

DETAIL: local node's recovery point: 0/90000A0; follow target node's fork point: 0/90000A0

NOTICE: setting node 3's upstream to node 2

WARNING: node "node3" attached in state "startup"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node2" (ID: 2)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

#修复node1 ,重新加入集群

#pg_rewind requires that either wal_log_hints is enabled。

repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind --verbose --dry-run

-bash-4.2$ repmgr -f /data/node1.conf node rejoin -d 'host=127.0.0.1 port=5433 user=repmgr dbname=repmgr'

NOTICE: rejoin target is node "node2" (ID: 2)

INFO: local node 1 can attach to rejoin target node 2

DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl  -w -D '/data/n001' start"

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 5        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

*禁止转载,可转发(转发文章请注明出处)

PostgreSQL高可用之repmgr

<think>好的,我现在要回答用户关于如何使用repmgr部署PostgreSQL高可用集群的问题。首先,我需要回顾用户提供的引用内容,确保回答准确。 根据引用[2],部署过程分为准备工作和主库配置。用户需要先在所有节点安装repmgrPostgreSQL。主库需要修改postgresql.conf,调整max_wal_senders、wal_level等参数。这部分是关键,因为正确的配置是复制的基础。然后创建复制用户和数据库,注册主节点。接下来是备库的安装和配置,同样需要修改配置文件,并从主库克隆数据。注册备库后,集群就初步搭建完成了。 用户提到的引用[3]指出,虽然主从搭建完成,但还存在自动故障转移和节点管理的问题。这里需要补充repmgr的故障检测和切换机制,比如设置repmgrd守护进程,并配置参数如failover=automatic。同时,可能需要结合外部工具如SLB来处理入口分离,而不是使用pgpool,这点需要强调。 接下来,用户可能需要知道如何验证集群状态,使用repmgr cluster show命令。此外,监控和日志管理也是重要环节,确保及时发现和处理问题。 生成相关问题时,要考虑用户可能的后续需求,比如如何扩展节点、如何处理网络分区或数据不一致的情况,以及与其他高可用方案的对比。例如,引用[1]提到Repmgr和Patroni的对比,可以引导用户进一步了解不同工具的优缺点。 需要确保回答结构清晰,分步骤讲解,并引用正确的来源。同时,注意使用中文,数学公式和代码块的格式正确。最后,检查是否有遗漏的步骤或常见问题,确保用户能够顺利部署并理解后续管理方法。</think>### 使用repmgr部署PostgreSQL高可用集群的步骤 以下流程基于引用内容整理,涵盖主从节点配置、故障转移及注意事项。 #### 1. 准备工作 - **所有节点**安装PostgreSQLrepmgr ```bash # 以CentOS为例 yum install postgresql-server repmgr ``` - 确保所有节点网络互通,SSH免密登录配置完成[^2]。 #### 2. 主库配置 **2.1 修改`postgresql.conf`** ```bash max_wal_senders = 10 max_replication_slots = 10 wal_level = 'hot_standby' hot_standby = on archive_mode = on archive_command = '/bin/true' # 禁用归档(repmgr无需归档) ``` **2.2 创建复制用户与数据库** ```sql CREATE USER repmgr WITH SUPERUSER LOGIN; CREATE DATABASE repmgr OWNER repmgr; ``` **2.3 注册主节点** ```bash repmgr primary register # 注册当前节点为主库 ``` #### 3. 备库配置 **3.1 同步主库数据** ```bash repmgr -h <主库IP> -U repmgr -d repmgr standby clone ``` **3.2 启动并注册备库** ```bash pg_ctl start repmgr standby register # 注册为备库 ``` #### 4. 验证集群状态 ```bash repmgr cluster show # 显示节点角色与状态 ``` 输出示例: ``` ID | Name | Role | Status | Upstream ---+--------+---------+-----------+--------- 1 | node1 | primary | * running | 2 | node2 | standby | running | node1 ``` #### 5. 实现自动故障转移 **5.1 启动守护进程`repmgrd`** ```bash repmgrd -d --verbose # 所有节点启动监控守护进程 ``` **5.2 配置`repmgr.conf`** ```ini failover=automatic # 启用自动切换 promote_command='repmgr standby promote' follow_command='repmgr standby follow' ``` #### 6. 注意事项 - **入口分离**:建议结合云服务商SLB(如AWS ALB或阿里云SLB)实现流量分发,避免使用pgpool增加复杂度[^3]。 - **节点扩容**:新节点需重复步骤3,并通过`repmgr node join`加入集群。 - **监控告警**:配置`repmgr`日志与Prometheus监控,实时检测节点健康状态。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值