ORACLE RAC 11.2.0.4 FOR SOLARIS 10 ASM 和DB因集群心跳丢失重启。该问题是BUG 10194190 18740837
导致的,修复该问题需要打patch 25142535。有一客户的solaris oracle 11.2.0.4 rac所有节点的DB已经打上补丁,但是
主机运行248天后,跑在上边的Oracle 11.2.0.4依然因集群心跳丢失而重启。根据ORACLE MOS官方回复,修复该BUG
的一个补丁包 patch 18740837是需要同时在GI软件上应用。
下边是相关客户案例的问题分析及解决处理总结。
1、问题节点DB告警日志报错提示
2、问题节点ASM告警日志报错提示
3、+ASM2_lmhb_4609_i78497.trc文件内容
4、OCSSD日志提示心跳超时
5、+ASM2_lmhb_4609_i78497.trc文件局部信息
===[ Session State Object ]===
----------------------------------------
SO: 0x3ffdb38d8, type: 4, owner: 0x400b0c258, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x400b0c258, name=session, file=ksu.h LINE:12729 ID:, pg=0
(session) sid: 145 ser: 1 trans: 0x0, creator: 0x400b0c258
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: 0x0
edition#: 0 oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476951 seq_num=11521 snap_id=1
wait times: snap=2 min 46 sec, exc=2 min 46 sec, total=2 min 46 sec
wait times: max=0.100000 sec, heur=2 min 46 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x5a8
Wait State:
fixed_waits=0 flags=0x22 boundary=0x0/-1
Session Wait History:
elapsed time of 0.000015 sec since current wait
0: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476950 seq_num=11520 snap_id=1
wait times: snap=0.000027 sec, exc=0.000027 sec, total=0.000027 sec
wait times: max=0.000000 sec
wait counts: calls=1 os=1
occurred after 0.000138 sec of elapsed time
1: waited for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476949 seq_num=11519 snap_id=1
wait times: snap=0.102094 sec, exc=0.102094 sec, total=0.102094 sec
wait times: max=0.100000 sec
wait counts: calls=1 os=1
occurred after 0.000015 sec of elapsed time
2: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476948 seq_num=11518 snap_id=1
wait