数据库版本
11.2.0.4.0
操作系统版本
CentOS release 6.3
事件描述:
1)数据库做完备份恢复后,使用alter database open resetlogs打开数据库。
2)添加OGG抽取进程
delete extract E_E2P_01
add extract E_E2P_01 ,tranlog,begin now
add exttrail /home/erpdev7/ggs/dirdat/e1 ,extract E_E2P_01
start E_E2P_01
3)启动OGG进程时ggserr.log会刷出如下日志:
2021-09-11 18:10:17 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:18 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:20 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:22 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:24 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:26 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:27 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:29 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:31 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:33 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:34 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
2021-09-11 18:10:36 INFO OGG-01515 Oracle GoldenGate Capture for Oracle, e_e2p_01.prm: Positioning to begin time Sep 11, 2021 6:09:48 PM.
给人的感觉就是OGG初始化的时候,找不到这个点对应的redo的位置。然后过一段时间or我们手工执行alter system switch logfile的时候,数据库就挂掉了。
数据库alert日志输出
Sat Sep 11 18:15:48 2021
Errors in file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_lgwr_3191.trc:
ORA-00316: log 116 of thread 1, type 0 in header is not log file
ORA-00312: online log 116 thread 1: '/u01/erpdev7data/group_116.2711.1053734237'
Errors in file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_lgwr_3191.trc:
ORA-00316: log 116 of thread 1, type 0 in header is not log file
ORA-00312: online log 116 thread 1: '/u01/erpdev7data/group_116.2711.1053734237'
LGWR (ospid: 3191): terminating the instance due to error 316
Sat Sep 11 18:15:49 2021
System state dump requested by (instance=1, osid=3191 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_diag_3170_20210911181549.trc
Dumping diagnostic data in directory=[cdmp_20210911181549], requested by (instance=1, osid=3191 (LGWR)), summary=[abnormal instance termination].
Instance terminated by LGWR, pid = 3191
很不幸的是 报错的这个logfile是current logfile。。当我们启动数据库的时候,日志输出:
Sat Sep 11 18:19:32 2021
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
parallel recovery started with 32 processes
Started redo scan
Errors in file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_ora_37350.trc:
ORA-00316: log 116 of thread 1, type 0 in header is not log file
ORA-00312: online log 116 thread 1: '/u01/erpdev7data/group_116.2711.1053734237'
Aborting crash recovery due to error 316
Errors in file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_ora_37350.trc:
ORA-00316: log 116 of thread 1, type 0 in header is not log file
ORA-00312: online log 116 thread 1: '/u01/erpdev7data/group_116.2711.1053734237'
Errors in file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_ora_37350.trc:
ORA-00316: log 116 of thread 1, type 0 in header is not log file
ORA-00312: online log 116 thread 1: '/u01/erpdev7data/group_116.2711.1053734237'
ORA-316 signalled during: ALTER DATABASE OPEN...
Sat Sep 11 18:19:33 2021
Errors in file /home/erpdev7/product/11.2.0/admin/DEV7_bjkjy-erp-dev7db01/diag/rdbms/dev7/DEV7/trace/DEV7_m000_37424.trc:
ORA-00316: log 116 of thread 1, type 0 in header is not log file
ORA-00312: online log 116 thread 1: '/u01/erpdev7data/group_116.2711.1053734237'
Checker run found 2 new persistent data failures
4)尝试使用隐含参数_allow_resetlogs_corruption和_allow_error_simulation,重建控制文件,recover database,然后open resetlogs,不行 直接00600,推scn也不行,最后只能重新还原数据库。
问题分析:
我们有很多线下环境,也做过很多次恢复,这是第一次出现这个问题,横向对比了一下环境的变化,就是这次的存放数据文件的文件系统变成了xfs,之前一直是ext4(修改的原因是由于库的数据文件大小超过16T,ext4无法继续扩展)。
尝试在备份还原后重建控制文件的时候,修改redo的存放路径,将其放到ext4的文件系统里,就一切正常了,配置OGG的时候也没有再次触发数据库异常。
看了网上也有很多oracle使用xfs的,开始怀疑不是xfs的问题了,且存放数据文件的盘是存储划上来的,也存在存储I/O异常导致redo损坏的可能,这方面也是要考虑的。
本文记录了一次数据库备份恢复后,使用OGG进行数据抽取时遇到的问题。在启动OGG进程时,日志反复显示定位到同一开始时间,随后数据库在尝试ALTER DATABASE OPEN RESETLOGS时发生错误,导致实例终止。经过排查,发现可能与文件系统从EXT4改为XFS有关,但并非唯一原因,存储I/O异常也被考虑在内。尝试多种恢复方法无效后,最终通过将redo日志路径改回EXT4文件系统解决了问题。这提示我们在不同文件系统间迁移数据库时需要谨慎,关注可能带来的兼容性和稳定性问题。
2727

被折叠的 条评论
为什么被折叠?



