oracle -15064,由 BUG 引发 ORA-15064 进而导致数据库实例意外中止

Mon Jan 05 01:40:50 2015

WARNING: ASM communication error: op 18 state 0x50 (3113)

ERROR: slave communication error with ASM

NOTE: Deferred communication with ASM instance

Errors in file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_ora_16581034.trc:

ORA-03113: end-of-file on communication channel

Process ID:

Session ID: 288 Serial number: 5649

NOTE: deferred map free for map id 4422

Mon Jan 05 01:40:55 2015

NOTE: ASMB terminating

Mon Jan 05 01:40:55 2015

***********************************************************************

Fatal NI connect error 12170.

VERSION INFORMATION:

TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Time: 05-JAN-2015 01:40:55

Tracing not turned on.

Tns error struct:

ns main err code: 12535

TNS-12535: TNS:operation timed out

ns secondary err code: 12606

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=test2)(PORT=64460))

WARNING: inbound connection timed out (ORA-3136)

Mon Jan 05 01:40:55 2015

***********************************************************************

Fatal NI connect error 12170.

VERSION INFORMATION:

TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Time: 05-JAN-2015 01:40:55

Tracing not turned on.

Tns error struct:

ns main err code: 12535

TNS-12535: TNS:operation timed out

ns secondary err code: 12606

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Mon Jan 05 01:40:55 2015

***********************************************************************

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=test2)(PORT=64530))

Errors in file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_asmb_5898342.trc:

ORA-15064: communication failure with ASM instance

ORA-03135: connection lost contact

Process ID:

Session ID: 216 Serial number: 7

Fatal NI connect error 12170.

WARNING: inbound connection timed out (ORA-3136)

Errors in file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_asmb_5898342.trc:

ORA-15064: communication failure with ASM instance

ORA-03135: connection lost contact

Process ID:

Session ID: 216 Serial number: 7

VERSION INFORMATION:

TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Time: 05-JAN-2015 01:40:55

Tracing not turned on.

Tns error struct:

ASMB (ospid: 5898342): terminating the instance due to error 15064

ns main err code: 12535

TNS-12535: TNS:operation timed out

Mon Jan 05 01:40:55 2015

***********************************************************************

ns secondary err code: 12606

Fatal NI connect error 12170.

nt main err code: 0

VERSION INFORMATION:

TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

nt secondary err code: 0

Time: 05-JAN-2015 01:40:55

Tracing not turned on.

nt OS err code: 0

Mon Jan 05 01:40:56 2015

System state dump requested by (instance=1, osid=5898342 (ASMB)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/test5/test5/trace/test5_diag_20578640.trc

Mon Jan 05 01:40:59 2015

ORA-1092 : opitsk aborting process

Mon Jan 05 01:41:00 2015

ORA-1092 : opitsk aborting process

Instance terminated by ASMB, pid = 5898342

Mon Jan 05 08:46:45 2015

Starting ORACLE instance (normal)

由于在故障点之前也会报一些ORA-3136的错误,故这里不做说明,与本次错误干系不大。从alert日志中发现,数据库实例在1:40:55时,被ASMB(进程号:5898342)进程终止,直到8点46分左右才开始启动(客户自己手动重启)。

2.相关trace文件分析

test5_ora_16581034.trc:

*** 2015-01-05 01:40:50.900

*** SESSION ID:(424.19913) 2015-01-05 01:40:50.900

*** CLIENT ID:() 2015-01-05 01:40:50.900

*** SERVICE NAME:(SYS$USERS) 2015-01-05 01:40:50.900

*** MODULE NAME:(backup incr datafile) 2015-01-05 01:40:50.900

*** ACTION NAME:(0000097 STARTED16) 2015-01-05 01:40:50.900

WARNING: ASM communication error: op 18 state 0x50 (3113)

ERROR: slave communication error with ASM

NOTE: Deferred communication with ASM instance

ORA-03113: end-of-file on communication channel

Process ID:

Session ID: 288 Serial number: 5649

基本与alert日志中报的错误相同,就是获知在1:40分这个故障点,数据库与ASM实例的连接中断了

test5_asmb_5898342.trc:

*** 2015-01-05 01:40:55.870

NOTE: ASMB terminating

*** 2015-01-05 01:40:55.922

ORA-15064: communication failure with ASM instance

ORA-03135: connection lost contact

Process ID:

Session ID: 216 Serial number: 7

error 15064 detected in background process

ORA-15064: communication failure with ASM instance

ORA-03135: connection lost contact

Process ID:

Session ID: 216 Serial number: 7

kjzduptcctx: Notifying DIAG for crash event

----- Abridged Call Stack Trace -----

ksedsts()+360

----- End of Abridged Call Stack Trace -----

*** 2015-01-05 01:40:55.954

ASMB (ospid: 5898342): terminating the instance due to error 15064

ksuitm: waiting up to [5] seconds before killing DIAG(20578640)

从这个trace文件可以获知,由于遭遇ora-15064错误,ASMB进程终止了数据库实例。

test5_diag_20578640.trc:

Session Wait History:

elapsed time of 1.543354 sec since last wait

0: waited for 'ASM background timer'

=0x0, =0x0, =0x0

wait_id=881966 seq_num=30034 snap_id=1

wait times: snap=0.000422 sec, exc=0.000422 sec, total=0.000422 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000016 sec of elapsed time

1: waited for 'ASM file metadata operation'

msgop=0x2, locn=0x3, =0x0

wait_id=881964 seq_num=30033 snap_id=2

wait times: snap=0.000082 sec, exc=0.000127 sec, total=4 min 4 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000000 sec of elapsed time

2: waited for 'latch: shared pool'

address=0x700000000107048, number=0x133, tries=0x0

wait_id=881965 seq_num=30032 snap_id=1

wait times: snap=4 min 4 sec, exc=4 min 4 sec, total=4 min 4 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000000 sec of elapsed time

3: waited for 'ASM file metadata operation'

msgop=0x2, locn=0x3, =0x0

wait_id=881964 seq_num=30031 snap_id=1

wait times: snap=0.000045 sec, exc=0.000045 sec, total=0.000045 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000019 sec of elapsed time

4: waited for 'ASM background timer'

=0x0, =0x0, =0x0

wait_id=881963 seq_num=30030 snap_id=1

wait times: snap=3.285876 sec, exc=3.285876 sec, total=3.285876 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000011 sec of elapsed time

5: waited for 'ASM file metadata operation'

msgop=0xc, locn=0x3, =0x0

wait_id=881962 seq_num=30029 snap_id=1

wait times: snap=0.000034 sec, exc=0.000034 sec, total=0.000034 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000019 sec of elapsed time

6: waited for 'ASM background timer'

=0x0, =0x0, =0x0

wait_id=881961 seq_num=30028 snap_id=1

wait times: snap=5.000528 sec, exc=5.000528 sec, total=5.000528 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000009 sec of elapsed time

7: waited for 'ASM file metadata operation'

msgop=0x0, locn=0x3, =0x0

wait_id=881960 seq_num=30027 snap_id=1

wait times: snap=0.000003 sec, exc=0.000003 sec, total=0.000003 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000016 sec of elapsed time

8: waited for 'ASM background timer'

=0x0, =0x0, =0x0

wait_id=881959 seq_num=30026 snap_id=1

wait times: snap=5.000493 sec, exc=5.000493 sec, total=5.000493 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000011 sec of elapsed time

9: waited for 'ASM file metadata operation'

msgop=0x0, locn=0x3, =0x0

wait_id=881958 seq_num=30025 snap_id=1

wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000017 sec of elapsed time

sample interval: 1 sec, max history 120 sec

---------------------------------------------------

[120 samples,                                        01:38:56 - 01:40:55]

waited for 'latch: shared pool', seq_num: 30032

p1: 'address'=0x700000000107048

p2: 'number'=0x133

p3: 'tries'=0x0

time_waited: >= 119 sec (still in wait)

---------------------------------------------------

Sampled Session History Summary:

longest_non_idle_wait: 'latch: shared pool'

[120 samples, 01:38:56 - 01:40:55]

time_waited: >= 119 sec (still in wait)

在ASMB进程的的dump文件中,发现有2个4分4秒的等待,waited for 'ASM file metadata operation'和waited for 'latch: shared pool',而其他的相关操作都是只有几微秒,甚至0秒的

3.ASM实例的alert日志:

Mon Jan 05 01:40:56 2015

***********************************************************************

TNS-12537: TNS:connection closed

Fatal NI connect error 12537, connecting to:

(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

ns secondary err code: 12560

VERSION INFORMATION:

TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production

nt main err code: 0

Time: 05-JAN-2015 01:40:56

nt secondary err code: 0

Tracing not turned on.

nt OS err code: 0

Tns error struct:

ns main err code: 12537

TNS-12537: TNS:connection closed

ns secondary err code: 12560

nt main err code: 0

nt secondary err code: 0

nt OS err code: 0

Mon Jan 05 04:42:23 2015

NOTE: ASMB process exiting due to lack of ASM file activity for 305 seconds

Mon Jan 05 08:46:53 2015

NOTE: client test5:test5 registered, osid 23593052, mbr 0x0

Mon Jan 05 08:47:24 2015

Starting background process ASMB

Mon Jan 05 08:47:24 2015

ASMB started with pid=26, OS id=20185292

Mon Jan 05 08:47:25 2015

NOTE: client +ASM:+ASM registered, osid 16384406, mbr 0x1

在故障点之前,ASM的alert日志中未报过TNS错误,但故障点的时候,ASM中也出现了TNS连接关闭的错误,因此,也不排除网络因素引起的ASM实例故障。4点多的时候有一个NOTE: ASMB process exiting due to lack of ASM file activity for 305 seconds,根据文档“NOTE ASMB process exiting due to lack of ASM file activity (文档 ID 754110.1)”,这个NOTE可以忽略。到了8:46的时候,ASM实例随着数据库的重启也正常启动了

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值