一、背景
在国庆期间巡检的时候,发现数据库alert日志中出现了异常重启的信息,当即对该报错进行分析处理。
二、处理过程
(1)数据库告警日志分析
node1 alert:
Sat Oct 05 13:05:14 2024
Thread 1 advanced to log sequence 6981 (LGWR switch)
Current log# 11 seq# 6981 mem# 0: +DATA/ybqddb/onlinelog/group_11.302.1144593261
Sat Oct 05 13:05:15 2024
Archived Log entry 12130 added for thread 1 sequence 6980 ID 0x8d497377 dest 1:
Sat Oct 05 14:50:48 2024
Reconfiguration started (old inc 27, new inc 29)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Oct 05 14:50:48 2024
Sat Oct 05 14:50:48 2024
LMS 3: 1 GCS shadows cancelled, 1 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Oct 05 14:50:48 2024
LMS 1: 1 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Oct 05 14:50:48 2024
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Oct 05 14:50:48 2024
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 32 processes
Started redo scan
Completed redo scan
read 76 KB redo, 16 data blocks need recovery
Started redo application at
Thread 2: logseq 5168, block 86069
Sat Oct 05 14:50:53 2024
Setting Resource Manager plan SCHEDULER[0x32DE]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Recovery of Online Redo Log: Thread 2 Group 10 Seq 5168 Reading mem 0
Mem# 0: +DATA/ybqddb/onlinelog/group_10.300.1144593255
Completed redo application of 0.01MB
Completed instance recovery at
Thread 2: logseq 5168, block 86222, scn 245107035
16 data blocks read, 16 data blocks written, 76 redo k-bytes read
Sat Oct 05 14:50:53 2024
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:29 new-inc#:29
Thread 2 advanced to log sequence 5169 (thread recovery)
Redo thread 2 internally disabled at seq 5169 (SMON)
Sat Oct 05 14:50:54 2024
Archived Log entry 12131 added for thread 2 sequence 5168 ID 0x8d497377 dest 1:
Sat Oct 05 14:50:54 2024
ARC2: Archiving disabled thread 2 sequence 5169
Archived Log entry 12132 added for thread 2 sequence 5169 ID 0x8d497377 dest 1:
minact-scn: master continuing after IR
minact-scn: Master considers inst:2 dead
Sat Oct 05 14:51:49 2024
Decreasing number of real time LMS from 4 to 0
Sat Oct 05 14:52:11 2024
Reconfiguration started (old inc 29, new inc 31)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Oct 05 14:52:11 2024
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Oct 05 14:52:11 2024
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Oct 05 14:52:11 2024
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Oct 05 14:52:11 2024
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Sat Oct 05 14:52:11 2024
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:31 errcnt:0
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Sat Oct 05 14:53:26 2024
Increasing number of real time LMS from 0 to 4
Sat Oct 05 17:05:20 2024
ALTER SYSTEM ARCHIVE LOG
Sat Oct 05 17:05:21 2024
Thread 1 advanced to log sequence 6982 (LGWR switch)
Current log# 5 seq# 6982 mem# 0: +DATA/ybqddb/onlinelog/group_5.290.1144593225
Sat Oct 05 17:05:21 2024
Archived Log entry 12134 added for thread 1 sequence 6981 ID 0x8d497377 dest 1:
Sat Oct 05 21:05:22 2024
ALTER SYSTEM ARCHIVE LOG
Sat Oct 05 21:05:22 2024
Thread 1 advanced to log sequence 6983 (LGWR switch)
Current log# 7 seq# 6983 mem# 0: +DATA/ybqddb/onlinelog/group_7.294.1144593235
Sat Oct 05 21:05:23 2024
Archived Log entry 12135 added for thread 1 sequence 6982 ID 0x8d497377 dest 1:
Sun Oct 06 01:08:47 2024
ALTER SYSTEM ARCHIVE LOG
Sun Oct 06 01:08:49 2024
Thread 1 advanced to log sequence 6984 (LGWR switch)
Current log# 9 seq# 6984 mem# 0: +DATA/ybqddb/onlinelog/group_9.298.1144593249
Sun Oct 06 01:08:49 2024
Archived Log entry 12138 added for thread 1 sequence 6983 ID 0x8d497377 dest 1:
Sun Oct 06 05:05:18 2024
ALTER SYSTEM ARCHIVE LOG
Sun Oct 06 05:05:18 2024
Thread 1 advanced to log sequence 6985 (LGWR switch)
Current log# 11 seq# 6985 mem# 0: +DATA/ybqddb/onlinelog/group_11.302.1144593261
Archived Log entry 12139 added for thread 2 sequence 5173 ID 0x8d497377 dest 1:
Sun Oct 06 05:05:19 2024
node2 alert:
Sat Oct 05 13:05:14 2024
Archived Log entry 12129 added for thread 2 sequence 5167 ID 0x8d497377 dest 1:
Sat Oct 05 14:50:47 2024
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/ybqddb/ybqddb2/trace/ybqddb2_asmb_15097.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 2109 Serial number: 3
Errors in file /u01/app/oracle/diag/rdbms/ybqddb/ybqddb2/trace/ybqddb2_asmb_15097.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 2109 Serial number: 3
ASMB (ospid: 15097): terminating the instance due to error 15064
Instance terminated by ASMB, pid = 15097
Sat Oct 05 14:51:59 2024
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = UNLIMITED
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide =