GaussDB实例重启和主备切换

最新推荐文章于 2025-04-25 09:00:37 发布

GottdesKrieges

最新推荐文章于 2025-04-25 09:00:37 发布

阅读量1.9k

点赞数 37

CC 4.0 BY-SA版权

分类专栏： GaussDB基础篇文章标签： gaussdb 数据库

本文链接：https://blog.youkuaiyun.com/Sebastien23/article/details/144776217

GaussDB基础篇专栏收录该内容

18 篇文章

订阅专栏

GaussDB实例重启和主备切换

数据库重启
实例主备切换
重置实例状态
模拟实例重启

数据库重启

GaussDB数据库集群状态有以下三种情况：

Normal：表示数据库可用，且数据有冗余备份。所有进程都在运行，主备关系正常。
Degraded：表示数据库可用，但数据没有冗余备份。
Unavailable：表示数据库不可用。

查询集群中所有实例和服务的状态：

gs_om -t status --detail

重启所有节点上的实例和服务：

# 方法一：gs_om命令
gs_om -t stop
gs_om -t start

# 方法二：cm_ctl命令
cm_ctl stop 
cm_ctl start

⭐️ 停止单个备机实例，该备机节点上的cm_server、etcd和数据库实例会停止服务，集群状态降级为Degraded。

cm_ctl命令重启单个节点：

[omm@gaussdb001 ~]$ cm_ctl stop -n 2
cm_ctl: stop the node: 2. 
cm_ctl: stop node, nodeid: 2
..............
cm_ctl: stop node successfully. 
cm_ctl: stopping the ETCD instance. 
cm_ctl: stop the ETCD instance in this node, nodeid: 2.
.
cm_ctl: The ETCD instance stopped successfully in node: 2.

[omm@gaussdb001 ~]$ cm_ctl start -n 2
cm_ctl: start ETCD in node, nodeid: 2
.
cm_ctl: the ETCD instance in this node starts successfully done.
cm_ctl: checking the ETCD cluster status
.
cm_ctl: check ETCD cluster finished.
cm_ctl: start the node:2. 
.........
cm_ctl: start node successfully.

gs_om命令重启单个节点：

[omm@gaussdb001 ~]$ gs_om -t stop -h 22.69.77.32
Stopping node.
=========================================
the Stop cmd is source /home/omm/gauss_env_file ; cm_ctl stop -n 2 -m fast.
Successfully stopped node.
=========================================
End stop node.

[omm@gaussdb001 ~]$ gs_om -t start -h 22.69.77.32
Starting node.
======================================================================
Successfully started node.
======================================================================
End start node.
Successfully started node.

实例主备切换

检查备机数据目录：

gs_om -t status --detai1

检查主备有无日志追赶（状态为Streaming表示同步正常）：

select * from pg_get_senders_catchup_time;

计划内切换，在目标备库节点执行：

gs_ctl switchover -D /gsdata/dn/dn_6002

主库发生故障时，可以在目标备库节点执行：

gs_ctl failover -D /gsdata/dn/dn_6002

在非目标备库节点执行切换会报错：

[gs_ctl]: can't create lock file "/gsdata/dn/dn_6002/pg_ctl.lock" : No such file or directory

切换示例：

[omm@gaussdb002 ~]$ gs_ctl switchover -D /gsdata/dn/dn_6002
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: gs_ctl switchover ,datadir is /gsdata/dn/dn_6002 
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: switchover term (1)
[2024-12-25 10:00:18.326][1278722][][gs_ctl]: waiting for server to switchover.........
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: done
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: switchover completed (/gsdata/dn/dn_6002)

查看切换日志：

[omm@gaussdb002 ~]$ tail -n50 $GAUSSLOG/bin/gs_ctl/gs_ctl-2024-12-16_163928-current.log 
...
[2024-12-25 10:00:18]
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: gs_ctl switchover ,datadir is /gsdata/dn/dn_6002 
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: switchover term (1)
[2024-12-25 10:00:18.326][1278722][][gs_ctl]: waiting for server to switchover.........
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: done
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: switchover completed (/gsdata/dn/dn_6002)

🦁 出现双主状态后的处理流程：

#（1）检查数据库实例状态，显示两个实例的状态都为Primary：
gs_om -t status --detai1
  
#（2）确定降为备机的节点，在节点上执行命令关闭服务：
gs_ctl stop -D /gsdata/dn/dn_6001

#（3）以Standby模式启动上面关闭的备节点：
gs_ctl start -D /gsdata/dn/dn_6001 -M standby

#（4）检查数据库实例状态，确认实例状态恢复：
gs_om -t status --detai1

重置实例状态

发生过主备切换后，集群的数据均衡状态会从balanced: Yes变为balanced: No（不影响数据库正常访问），需要手动重置。

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node            node_ip         instance                state            | node            node_ip         instance                state            | node            node_ip         instance                state
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  22.69.77.30 22.69.77.30    6001 /gsdata/dn/dn_6001 P Standby Normal | 2  22.69.77.32 22.69.77.32    6002 /gsdata/dn/dn_6002 S Primary Normal | 3  22.69.77.28 22.69.77.28    6003 /gsdata/dn/dn_6003 S Standby Normal

登录数据库任意主机，手动恢复实例状态：

gs_om -t switch --reset --time-out=300

该操作会重置主备实例角色（即切回到原来的主库），因此操作前需要保证主备实例状态正常。

示例:

[omm@gaussdb001 ~]$ gs_om -t switch --reset --time-out=300
Operating: Switch reset.
cm_ctl: cmserver is rebalancing the cluster automatically.
....
cm_ctl: switchover successfully.
Operation succeeded: Switch reset.

[omm@gaussdb001 ~]$ gs_om -t status --detail
...
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node            node_ip         instance                state            | node            node_ip         instance                state            | node            node_ip         instance                state
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  22.69.77.30 22.69.77.30    6001 /gsdata/dn/dn_6001 P Primary Normal | 2  22.69.77.32 22.69.77.32    6002 /gsdata/dn/dn_6002 S Standby Normal | 3  22.69.77.28 22.69.77.28    6003 /gsdata/dn/dn_6003 S Standby Normal

模拟实例重启

🐯 重启备机服务器，集群状态和被重启备机的状态变化过程如下：

(1) cluster_state: Normal, 主库: Primary Normal, 备机1: Standby Normal, 备机2: Standby Normal
(2) cluster_state: Degraded, 主库: Primary Normal, 备机1: Down Unknown, 备机2: Standby Normal
(3) cluster_state: Degraded, 主库: Primary Normal, 备机1: Pending Starting, 备机2: Standby Normal
(4) cluster_state: Normal, 主库: Primary Normal, 备机1: Standby Normal, 备机2: Standby Normal

即重启备机过程中不会发生主备自动切换，备机重启后会自动恢复。

🦁 重启主库服务器，集群状态和主备机的状态变化过程如下：

(1) cluster_state: Normal, 主库: Primary Normal, 备机1: Standby Normal, 备机2: Standby Normal
(2) cluster_state: Unavailable, 主库: Down Unknown, 备机1: Standby Need repair(Disconnected), 备机2: Standby Need repair(Disconnected)
(3) cluster_state: Degraded, 主库: Down Unknown, 备机1: Primary Normal, 备机2: Standby Normal
(4) cluster_state: Degraded, 主库: Down Starting, 备机1: Primary Normal, 备机2: Standby Normal
(5) cluster_state: Normal, 主库: Standby Normal, 备机1: Primary Normal, 备机2: Standby Normal