背景:生产环境greenplum集群(greenplum 4.3.8)数据节点seg12主机上4个mirror及seg13主机上4个mirror实例(对应primary在seg12)发生异常(down),mirror实例宕机后短时间内seg12主机发生异常宕机(此时seg13上已经异常的4个实例对应4个primary主机宕机,即有4个实例对应主备均异常),集群无完整的数据副本已无法进行正常提供服务执行sql报错。此时集群8个mirror异常4个primary异常。
过程:紧急联系机房进行宕机主机恢复,主机未发现硬件类故障,正常开机。
1.执行gprecoverseg因存在4个实例主备均异常故无法进行恢复。
2.尝试gpstop -M fast停止集群进行,进行重启。集群停止后进行启动失败:
ERROR:-gpstart error: Do not have enough valid segments to start the array
3.查看启动异常实例日志
2021-09-28 18:23:13.605756 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","database system was not properly shut down; automatic recovery in progr
ess",,,,,,,0,,"xlog.c",6721,
2021-09-28 18:23:13.625746 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","redo starts at 2379/10E603E8",,,,,,,0,,"xlog.c",6853,
2021-09-28 18:23:13.949767 CST,,,p21903,th-1181391072,,,,0,,,seg-1,,,,,"LOG","00000","unexpected pageaddr 2378/C7418000 in log file 9081, segment 4, offset 5