GaussDB实例重启和主备切换

数据库重启

GaussDB数据库集群状态有以下三种情况:

  • Normal:表示数据库可用,且数据有冗余备份。所有进程都在运行,主备关系正常。
  • Degraded:表示数据库可用,但数据没有冗余备份。
  • Unavailable:表示数据库不可用。

查询集群中所有实例和服务的状态:

gs_om -t status --detail

重启所有节点上的实例和服务:

# 方法一:gs_om命令
gs_om -t stop
gs_om -t start

# 方法二:cm_ctl命令
cm_ctl stop 
cm_ctl start 

⭐️ 停止单个备机实例,该备机节点上的cm_server、etcd和数据库实例会停止服务,集群状态降级为Degraded。

cm_ctl命令重启单个节点:

[omm@gaussdb001 ~]$ cm_ctl stop -n 2
cm_ctl: stop the node: 2. 
cm_ctl: stop node, nodeid: 2
..............
cm_ctl: stop node successfully. 
cm_ctl: stopping the ETCD instance. 
cm_ctl: stop the ETCD instance in this node, nodeid: 2.
.
cm_ctl: The ETCD instance stopped successfully in node: 2.

[omm@gaussdb001 ~]$ cm_ctl start -n 2
cm_ctl: start ETCD in node, nodeid: 2
.
cm_ctl: the ETCD instance in this node starts successfully done.
cm_ctl: checking the ETCD cluster status
.
cm_ctl: check ETCD cluster finished.
cm_ctl: start the node:2. 
.........
cm_ctl: start node successfully.

gs_om命令重启单个节点:

[omm@gaussdb001 ~]$ gs_om -t stop -h 22.69.77.32
Stopping node.
=========================================
the Stop cmd is source /home/omm/gauss_env_file ; cm_ctl stop -n 2 -m fast.
Successfully stopped node.
=========================================
End stop node.

[omm@gaussdb001 ~]$ gs_om -t start -h 22.69.77.32
Starting node.
======================================================================
Successfully started node.
======================================================================
End start node.
Successfully started node.

实例主备切换

检查备机数据目录:

gs_om -t status --detai1   

检查主备有无日志追赶(状态为Streaming表示同步正常):

select * from pg_get_senders_catchup_time;

计划内切换,在目标备库节点执行:

gs_ctl switchover -D /gsdata/dn/dn_6002

主库发生故障时,可以在目标备库节点执行:

gs_ctl failover -D /gsdata/dn/dn_6002

在非目标备库节点执行切换会报错:

[gs_ctl]: can't create lock file "/gsdata/dn/dn_6002/pg_ctl.lock" : No such file or directory

切换示例:

[omm@gaussdb002 ~]$ gs_ctl switchover -D /gsdata/dn/dn_6002
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: gs_ctl switchover ,datadir is /gsdata/dn/dn_6002 
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: switchover term (1)
[2024-12-25 10:00:18.326][1278722][][gs_ctl]: waiting for server to switchover.........
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: done
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: switchover completed (/gsdata/dn/dn_6002)

查看切换日志:

[omm@gaussdb002 ~]$ tail -n50 $GAUSSLOG/bin/gs_ctl/gs_ctl-2024-12-16_163928-current.log 
...
[2024-12-25 10:00:18]
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: gs_ctl switchover ,datadir is /gsdata/dn/dn_6002 
[2024-12-25 10:00:18.315][1278722][][gs_ctl]: switchover term (1)
[2024-12-25 10:00:18.326][1278722][][gs_ctl]: waiting for server to switchover.........
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: done
[2024-12-25 10:00:24.394][1278722][][gs_ctl]: switchover completed (/gsdata/dn/dn_6002)

🦁 出现双主状态后的处理流程:

#(1)检查数据库实例状态,显示两个实例的状态都为Primary:
gs_om -t status --detai1
  
#(2)确定降为备机的节点,在节点上执行命令关闭服务:
gs_ctl stop -D /gsdata/dn/dn_6001

#(3)以Standby模式启动上面关闭的备节点:
gs_ctl start -D /gsdata/dn/dn_6001 -M standby

#(4)检查数据库实例状态,确认实例状态恢复:
gs_om -t status --detai1

重置实例状态

发生过主备切换后,集群的数据均衡状态会从balanced: Yes变为balanced: No(不影响数据库正常访问),需要手动重置。

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node            node_ip         instance                state            | node            node_ip         instance                state            | node            node_ip         instance                state
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  22.69.77.30 22.69.77.30    6001 /gsdata/dn/dn_6001 P Standby Normal | 2  22.69.77.32 22.69.77.32    6002 /gsdata/dn/dn_6002 S Primary Normal | 3  22.69.77.28 22.69.77.28    6003 /gsdata/dn/dn_6003 S Standby Normal

登录数据库任意主机,手动恢复实例状态:

gs_om -t switch --reset --time-out=300

该操作会重置主备实例角色(即切回到原来的主库),因此操作前需要保证主备实例状态正常。

示例:

[omm@gaussdb001 ~]$ gs_om -t switch --reset --time-out=300
Operating: Switch reset.
cm_ctl: cmserver is rebalancing the cluster automatically.
....
cm_ctl: switchover successfully.
Operation succeeded: Switch reset.

[omm@gaussdb001 ~]$ gs_om -t status --detail
...
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node            node_ip         instance                state            | node            node_ip         instance                state            | node            node_ip         instance                state
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  22.69.77.30 22.69.77.30    6001 /gsdata/dn/dn_6001 P Primary Normal | 2  22.69.77.32 22.69.77.32    6002 /gsdata/dn/dn_6002 S Standby Normal | 3  22.69.77.28 22.69.77.28    6003 /gsdata/dn/dn_6003 S Standby Normal

模拟实例重启

🐯 重启备机服务器,集群状态和被重启备机的状态变化过程如下:

  • (1) cluster_state: Normal, 主库: Primary Normal, 备机1: Standby Normal, 备机2: Standby Normal
  • (2) cluster_state: Degraded, 主库: Primary Normal, 备机1: Down Unknown, 备机2: Standby Normal
  • (3) cluster_state: Degraded, 主库: Primary Normal, 备机1: Pending Starting, 备机2: Standby Normal
  • (4) cluster_state: Normal, 主库: Primary Normal, 备机1: Standby Normal, 备机2: Standby Normal

即重启备机过程中不会发生主备自动切换,备机重启后会自动恢复。

🦁 重启主库服务器,集群状态和主备机的状态变化过程如下:

  • (1) cluster_state: Normal, 主库: Primary Normal, 备机1: Standby Normal, 备机2: Standby Normal
  • (2) cluster_state: Unavailable, 主库: Down Unknown, 备机1: Standby Need repair(Disconnected), 备机2: Standby Need repair(Disconnected)
  • (3) cluster_state: Degraded, 主库: Down Unknown, 备机1: Primary Normal, 备机2: Standby Normal
  • (4) cluster_state: Degraded, 主库: Down Starting, 备机1: Primary Normal, 备机2: Standby Normal
  • (5) cluster_state: Normal, 主库: Standby Normal, 备机1: Primary Normal, 备机2: Standby Normal

即重启主库服务器过程中会发生主备自动切换,某个备库会自动升级为主库。主库重启成功后会以备库身份加入集群,并且集群的balanced状态会变为No。

执行下面的命令来重置所有实例的状态和角色:

gs_om -t switch --reset --time-out=300
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

GottdesKrieges

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值