Oracle特殊恢复：异常掉电导致的ORA-600 [kfrValAcd30]故障处理

原创

已于 2024-04-23 10:25:29 修改 · 837 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#程序人生 #oracle #数据库 #后端

于 2024-04-23 10:18:52 首次发布

一、问题描述

现象：硬件掉电后，oracle集群无法启动。

[root@rac2 ~]# crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[root@rac2 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

二、故障处理

查看集群组件发现ora.asm状态为offline

root@rac2 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown   
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rac2                                         
ora.crf
      1        ONLINE  ONLINE       rac2                                         
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  ONLINE       rac2                                         
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2                                         
ora.ctssd
      1        ONLINE  ONLINE       rac2                     OBSERVER            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.drivers.acfs
      1        ONLINE  ONLINE       rac2                                         
ora.evmd
      1        ONLINE  INTERMEDIATE rac2                                         
ora.gipcd
      1        ONLINE  ONLINE       rac2                                         
ora.gpnpd
      1        ONLINE  ONLINE       rac2                                         
ora.mdnsd
      1        ONLINE  ONLINE       rac2

查看grid alert日志发现磁盘组没有mount

[ohasd(4329)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2018-05-08 04:12:24.940:
[cssd(4576)]CRS-1707:Lease acquisition for node rac2 number 2 completed
2018-05-08 04:12:26.188:
[cssd(4576)]CRS-1605:CSSD voting file is online: /dev/asmdisk/oraasm-OCR_0000; details in /u01/app/11.2.0/grid/log/rac2/cssd/ocssd.log.
2018-05-08 04:12:28.723:
[cssd(4576)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2018-05-08 04:12:30.617:
[ctssd(4660)]CRS-2401:The Cluster Time Synchronization Service started on host rac2.
2018-05-08 04:12:30.617:
[ctssd(4660)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1.
2018-05-08 04:12:32.348:
[ohasd(4329)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2018-05-08 04:12:32.348:
[ohasd(4329)]CRS-2769:Unable to failover resource 'ora.diskmon'.

查看asm_alert，出现ORA-00600 [kfrValAcd30]的报错

NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 6 for pid 23, osid 5727
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asmdisk/oraasm-ASM_0000
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asmdisk/oraasm-ASM_0001
NOTE: F1X0 found on disk 1 au 2 fcn 0.0
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asmdisk/oraasm-ASM_0002
NOTE: F1X0 found on disk 2 au 2 fcn 0.0
NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/asmdisk/oraasm-ASM_0003
NOTE: cache mounting (first) normal redundancy group 2/0x877A96CD (DATA)
* allocate domain 2, invalid = TRUE
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=8.390 group=2 (DATA)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_5727.trc  (incident=50111):
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [8], [390], [9], [390], [], [], [], [], []
ORA-15017: diskgroup "ASM" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM"
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_50111/+ASM2_ora_5727_i50111.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_5727.trc:
ORA-00600: internal error code,