Oracle 12.1.2.0 的这个BUG会导致频繁宕机,赶快处理

image.png
image.png
image.png

环境说明:

DB:Oracle 12.1.2.0 ADG
OS:Windows NT Version V6.2 

问题现象:

Oracle数据库自动宕机,告警日志如下:

Archived Log entry 174990 added for thread 1 sequence 804504 ID 0x59d7346b dest 1:
Thu Jul 31 15:30:02 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x1181C97878] [PC:0x7FFEB886A5BC, 00007FFEB886A5BC]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_21380.trc  (incident=790563) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A5BC] [ACCESS_VIOLATION] [ADDR:0x1181C97878] [PC:0x7FFEB886A5BC] [UNABLE_TO_WRITE] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790563\cjc_m000_21380_i790563.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:30:23 2025
Dumping diagnostic data in directory=[cdmp_20250731153023], requested by (instance=1, osid=21380 (M000)), summary=[incident=790563].
Thu Jul 31 15:30:32 2025
Sweep [inc][790563]: completed
Sweep [inc2][790563]: completed
Thu Jul 31 15:31:07 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x11AA8B44F0] [PC:0x7FFEB886A331, 00007FFEB886A331]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_21772.trc  (incident=790875) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A331] [ACCESS_VIOLATION] [ADDR:0x11AA8B44F0] [PC:0x7FFEB886A331] [UNABLE_TO_WRITE] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790875\cjc_m000_21772_i790875.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:31:14 2025
Dumping diagnostic data in directory=[cdmp_20250731153114], requested by (instance=1, osid=21772 (M000)), summary=[incident=790875].
Thu Jul 31 15:31:31 2025
Sweep [inc][790875]: completed
Sweep [inc2][790875]: completed
Thu Jul 31 15:32:05 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x119E2AAC88] [PC:0x7FFEB886A5BC, 00007FFEB886A5BC]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_16864.trc  (incident=789667) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A5BC] [ACCESS_VIOLATION] [ADDR:0x119E2AAC88] [PC:0x7FFEB886A5BC] [UNABLE_TO_WRITE] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789667\cjc_m000_16864_i789667.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:32:12 2025
Dumping diagnostic data in directory=[cdmp_20250731153212], requested by (instance=1, osid=16864 (M000)), summary=[incident=789667].
Thu Jul 31 15:32:32 2025
Sweep [inc][789667]: completed
Sweep [inc2][789667]: completed
Thu Jul 31 15:33:04 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x1089EF0000] [PC:0x7FFEB886A323, 00007FFEB886A323]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_5000.trc  (incident=789763) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A323] [ACCESS_VIOLATION] [ADDR:0x1089EF0000] [PC:0x7FFEB886A323] [UNABLE_TO_READ] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789763\cjc_m000_5000_i789763.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:33:14 2025
Dumping diagnostic data in directory=[cdmp_20250731153314], requested by (instance=1, osid=5000 (M000)), summary=[incident=789763].
Thu Jul 31 15:33:34 2025
Sweep [inc][789763]: completed
Sweep [inc2][789763]: completed
Thu Jul 31 15:33:54 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x74] [PC:0x7FF662D84E1F, qecinisub()+63]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_34180.trc  (incident=791099) (PDBNAME=PDBcjc):
ORA-07445: 出现异常错误: 核心转储 [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_791099\cjc_ora_34180_i791099.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_34180.trc  (incident=791100) (PDBNAME=PDBcjc):
ORA-00600: 内部错误代码, 参数: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-07445: 出现异常错误: 核心转储 [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_791100\cjc_ora_34180_i791100.trc
Thu Jul 31 15:34:04 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_12928.trc  (incident=789691) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789691\cjc_m000_12928_i789691.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:34:05 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_791099\cjc_ora_34180_i791099.trc:
ORA-00600: 内部错误代码, 参数: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-07445: 出现异常错误: 核心转储 [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
Thu Jul 31 15:34:05 2025
Dumping diagnostic data in directory=[cdmp_20250731153405], requested by (instance=1, osid=34180), summary=[incident=791100].
Thu Jul 31 15:34:13 2025
Sweep [inc][789691]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_12928.trc  (incident=789692) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789692\cjc_m000_12928_i789692.trc
Sweep [inc][791100]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:34:18 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_12928.trc:
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
Thu Jul 31 15:34:19 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc  (incident=788659) (PDBNAME=PDBcjc):
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788659\cjc_pmon_25648_i788659.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sweep [inc][791099]: completed
Sweep [inc2][791100]: completed
Sweep [inc2][789691]: completed
Sweep [inc][789692]: completed
Sweep [inc][788659]: completed
Sweep [inc2][789692]: completed
Thu Jul 31 15:34:55 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc:
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc  (incident=788660) (PDBNAME=CDB$ROOT):
KSBRDP-472 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788660\cjc_pmon_25648_i788660.trc
Thu Jul 31 15:34:56 2025
Dumping diagnostic data in directory=[cdmp_20250731153456], requested by (instance=1, osid=25648 (PMON)), summary=[incident=788659].
Thu Jul 31 15:34:58 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_8428.trc  (incident=790107) (PDBNAME=PDBcjc):
ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x13159E5C8], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790107\cjc_ora_8428_i790107.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:34:59 2025
USER (ospid: 25648): terminating the instance due to error 472
Thu Jul 31 15:35:13 2025
System state dump requested by (instance=1, osid=25648 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_diag_25976_20250731153513.trc
Thu Jul 31 15:35:50 2025
Instance terminated by USER, pid = 25648

问题分析:

问题期间,出现多个ORA-00600,ORA-07445报错:

ORA-07445: exception encountered: core dump [PC:0x7FFEB886A5BC] [ACCESS_VIOLATION] [ADDR:0x1181C97878] [PC:0x7FFEB886A5BC] [UNABLE_TO_WRITE] []
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A331] [ACCESS_VIOLATION] [ADDR:0x11AA8B44F0] [PC:0x7FFEB886A331] [UNABLE_TO_WRITE] []
ORA-07445: 出现异常错误: 核心转储 [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x13159E5C8], [], [], [], [], [], [], [], [], [], []

最后,PMON进程异常

System state dump requested by (instance=1, osid=25648 (PMON)), summary=[abnormal instance termination].

自动生成了SSD日志:

System State dumped to trace file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_diag_25976_20250731153513.trc
Thu Jul 31 15:35:50 2025
Instance terminated by USER, pid = 25648

问题原因:

宕机前最后一个600错误是:
ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x13159E5C8], [], [], [], [], [], [], [], [], [], []

根据,在MOS上进行搜索(support.oracle.com):
ORA-600/ORA-7445/ORA-700 Error Look-up Tool(Doc ID 153788.1)
image.png
image.png
没有搜到?

直接搜索关键字:ORA-00600: [kghfrh:ds]
image.png
ORA-600 [kghfrh:ds] (Doc ID 300602.1) 有 27个bug。
image.png
根据 12.1.0.2 和 Crash关键字进行搜索,没有完全吻合的,和 Bug 18388363,Bug 22243719相似。
Bug 22243719 - Several Internal Errors Due to Shared Pool Memory Corruptions in 11.2.0.4 and Later. Instance May Crash (Doc ID 22243719.8)

MOS内容如下:
Instance Termination with ORA-07445 [kghsrch()+144], ORA-00600 [kghfrh:ds] (Doc ID 2128933.1)
image.png
image.png
image.png
image.png

解决方案:

结合多篇mos文章:
永久解决方案:
打 12.1.0.2 最新补丁、或升级到12.2、或升级到19C(最彻底)。

临时解决方案:

alter system set "_enable_shared_pool_durations"=false scope=spfile;

重启实例生效,不一定能解决

日志分析:

需要哪些日志:

1.alert_cjc.log
2.E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc
3.E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788660\cjc_pmon_25648_i788660.trc
4.E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_8428.trc
5.E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790107\cjc_ora_8428_i790107.trc
6.E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_diag_25976_20250731153513.trc

通过 cjc_pmon_25648.trc 进行分析:
基本信息:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Windows NT Version V6.2  
CPU                 : 16 - type 8664, 16 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:46765M/130237M, Ph+PgF:81254M/168109M 
Instance name: cjc
Redo thread mounted by this instance: 1
Oracle process number: 2
Windows thread id: 25648, image: ORACLE.EXE (PMON)

Oracle版本:12.1.0.2。
操作系统:Windows Server 2012(NT 6.2)。
资源:16核CPU,物理内存130GB(可用47GB),无严重内存压力。
进程:PMON(进程号2),负责清理失效进程和会话。
容器:操作发生在PDBcjc(Container ID=3)。

错误触发点:

Incident 788659 created, dump file: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788659\cjc_pmon_25648_i788659.trc
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []

PMON: fatal error while deleting s.o. 0000000580827100 in this tree:
----------------------------------------
SO: 0x000000106900CCD0, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x000000106900CCD0, name=process, file=ksu.h LINE:14165, pg=0 conuid=0
(process) Oracle pid:307, ser:251, calls cur/top: 0x0000000FE5880488/0x0000000FE5880488
          flags : (0x1) DEAD  icon_uid:0
          flags2: (0x8000),  flags3: (0x10) 
          intr error: 0, call error: 0, sess error: 0, txn error 0
          intr queue: empty
  ksudlp FALSE at location: 0
  Cleanup details:
    Found dead = 37 sec ago
    Total Cleanup attempts = 1, Total time = 0 sec,
      Cleanup timer = 0.000000 sec
    Last Cleanup attempt (full) started 37 sec ago, Length = in progress,
      Cleanup timer = (Total = 0.000000 sec, Current = 0.000000 sec, Timeouts = 0)
  (post info) last post received: 140 0 2
              last post received-location: ksl2.h LINE:3108 ID:kslpsr
              last process to post me: 0x105901aae8 1 6
              last post sent: 0 0 26
              last post sent-location: ksa2.h LINE:290 ID:ksasnd
              last process posted by me: 0x105901aae8 1 6
              waiter on post event: 0
  (latch info) hold_bits=0x0
  Process Group: DEFAULT, pseudo proc: 0x0000001059163D78
  O/S info: user: SYSTEM, term: CJC-SVR, ospid: 34180 (DEAD)

错误代码:ORA-00600 [kghfrh:ds]。
kghfrh: Kernel Generic Heap Free Heap,Oracle内存管理模块。
ds: Data Structure,表示操作涉及损坏的内存数据结构。
参数 0x1314CD428 是目标内存地址,指向一个无效的堆结构。
触发场景:PMON尝试删除一个死亡进程(PID 307,OS PID 34180)。该进程已标记为DEAD(死亡37秒),但清理过程中访问了无效内存。
根本原因:共享池(Shared Pool)中的库对象(Library Object)损坏或并发冲突,导致PMON无法安全释放内存。

详细说明:
PMON: fatal error while deleting s.o. 0000000580827100 in this tree:
(1)PMON: 进程监视器 (Process Monitor),Oracle 后台核心进程之一,负责清理失败的用户进程、回滚事务、释放锁和资源等。
(2)fatal error: 表明 PMON 在执行清理任务时遇到了一个严重、无法恢复的错误。
(3)deleting s.o.: s.o. 通常指代 Session Object 或更广义的 Server Process Object。PMON 正在尝试从内部内存结构(树状结构)中删除这个对象。
(4)0000000580827100: 这是 PMON 试图删除的那个特定 s.o. 对象在 SGA 内存中的十六进制地址。
(5)in this tree: 指这个 s.o. 对象所属的 Oracle 内部内存管理结构(通常是一个树状结构,如空闲列表或活动列表)。PMON 在遍历或操作这个结构时遇到了问题。

被清理的进程对象 (SO - Session Object / Server Process Object) 详细信息:
SO: 0x000000106900CCD0, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
(1)SO: 0x000000106900CCD0: 这是描述目标进程(Oracle pid 307)的主 Session Object 结构在 SGA 中的内存地址。它与 PMON 试图删除的 s.o. (0000000580827100) 是不同的对象,但紧密相关(s.o. 很可能是 SO 结构内部或关联的一个组件)。
(2)type: 2: 对象类型代码。2 通常代表 PROCESS 类型,即一个服务器进程或后台进程对象。
(3)owner: 0x0000000000000000: 对象的所有者。0 通常表示该对象当前没有被任何其他结构(如 latch, enqueue)显式锁定或持有。
(4)flag: INIT/-/-/0x00: 对象的状态标志。
(5)INIT: 表示对象处于初始化状态。这在一个正在被清理的死进程对象上出现是异常且关键的。PMON 在清理过程中期望该对象处于某种可清理状态(如 DEAD),但它卡在了 INIT 状态,这很可能就是导致 fatal error 的根本原因。
(6)-/-/0x00: 其他标志位未设置或为空。
(7)if: 0x3: Instance Flags (实例标志)。0x3 的具体含义需查内部文档,通常表示该对象与当前实例相关。
(8)c: 0x3: 引用计数 (Reference Count)。0x3 表示有 3 个地方引用了这个 SO 结构。PMON 需要等待引用计数降为 0 才能安全释放它。如果计数因某种原因无法下降,就会导致清理挂起。

proc=0x000000106900CCD0, name=process, file=ksu.h LINE:14165, pg=0 conuid=0
(1)确认 proc 指向同一个 SO 地址 (0x000000106900CCD0)。
(2)name=process: 再次确认这是一个进程对象。
(3)file=ksu.h LINE:14165: 指示这个 SO 结构是在内核服务工具层 (Kernel Services Utility) 的 ksu.h 头文件第 14165 行附近定义的。
(4)pg=0: Process Group ID (进程组 ID) 为 0,通常指默认进程组。
(5)conuid=0: Container UID (容器用户 ID) 为 0,通常表示属于根容器 (CDB$ROOT),或者在不涉及多租户的环境中未使用。

(process) Oracle pid:307, ser:251, calls cur/top: 0x0000000FE5880488/0x0000000FE5880488
(1)Oracle pid:307: 该进程在 Oracle 实例内部的进程标识符。
(2)ser:251: 序列号 (Serial#)。与 pid 一起唯一标识一个进程(例如在 V$PROCESS 视图中)。307,251 就是这个进程的唯一标识。
(3)calls cur/top: 0x0000000FE5880488/0x0000000FE5880488: 指向当前调用堆栈和顶层调用堆栈的内存地址。两者地址相同 (0x0000000FE5880488),表明该进程在死亡时没有活动的 SQL 调用堆栈在执行。它处于空闲或等待状态。这对于 PMON 清理来说通常是个好迹象,因为意味着没有未完成的事务需要复杂回滚。

flags : (0x1) DEAD icon_uid:0
(1)flags: (0x1) DEAD: 这是最重要的标志之一。0x1 对应 DEAD 状态位被置位。这明确告诉 Oracle 内核(包括 PMON)这个进程已经被操作系统终止或内部检测为失效,需要被清理。
(2)icon_uid:0: 实例连接 UID (Instance Connection UID),通常为 0。

intr error: 0, call error: 0, sess error: 0, txn error 0
这些计数器记录进程生命周期中发生的特定类型的错误次数(中断错误、调用错误、会话错误、事务错误)。全为 0 表明这个进程在死亡前没有记录到这类内部错误。它的死亡可能是由外部因素(如 OOM Killer, kill -9, 网络中断)或未捕获的内部问题导致的。

intr queue: empty
中断队列为空。表明该进程在死亡时没有待处理的中断请求。这也简化了清理工作。

ksudlp FALSE at location: 0
(1)ksudlp 是 Kernel Service User Process Dead Process Cleanup 的缩写,这是 PMON 用来清理死进程的核心函数。
(2)FALSE: 表明 ksudlp 函数在某个检查点返回了 FALSE,意味着它未能成功完成对该进程的清理。
(3)at location: 0: 指示在 ksudlp 函数内部哪个代码点返回了 FALSE。0 通常代表函数入口或一个非常早期的检查点。这说明清理在刚开始尝试时就失败了,甚至没能取得进展。这与 SO 状态卡在 INIT 相吻合。

清理尝试的详细统计信息:
Cleanup details:
Found dead = 37 sec ago
PMON 在 37 秒前 首次检测到或被告知该进程 (pid 307, ser 251) 已经死亡 (DEAD)。

Total Cleanup attempts = 1, Total time = 0 sec,
Cleanup timer = 0.000000 sec
(1)PMON 到目前为止只尝试了 1 次 清理这个进程。
(2)这 1 次 尝试消耗的总时间是 0 秒。这非常异常!表明清理尝试几乎立即失败或被阻止,没有花费任何实际清理时间。这与 ksudlp FALSE at location: 0 的瞬间失败一致。
(3)Cleanup timer 记录了本次清理的计时器值(也是 0 秒)。

Last Cleanup attempt (full) started 37 sec ago, Length = in progress,
Cleanup timer = (Total = 0.000000 sec, Current = 0.000000 sec, Timeouts = 0)
最后一次(也是唯一一次)清理尝试:
(1)Started 37 sec ago: 开始于 37 秒前(与 Found dead 时间相同,表明一发现死亡就立即尝试清理)。
(2)Length = in progress: 状态显示为 in progress (进行中)。这是关键矛盾点。虽然计时器显示耗时 0 秒且 ksudlp 返回了 FALSE,但 PMON 的内部状态机却认为这次清理尝试仍在进行中。这种状态不一致是导致 fatal error 的直接原因。PMON 无法推进清理,也无法放弃。
(3)Cleanup timer: 再次确认总耗时 0 秒,当前耗时 0 秒,超时次数为 0。

进程间通信 (Post) 信息:
(post info) last post received: 140 0 2
last post received-location: ksl2.h LINE:3108 ID:kslpsr
记录该进程最后一次接收到的其他进程的“Post”信号(一种进程间通信机制,用于唤醒等待)。
(1)140 0 2: 包含发送者信息(pid?/sender id?)和 post 类型的编码。需内部解析。
(2)ksl2.h LINE:3108 ID:kslpsr: 接收发生在 kslpsr (Kernel Service Latch Post Receive) 函数中。这表明该进程死亡前可能在等待一个 latch 或 post 事件。

last process to post me: 0x105901aae8 1 6
最后一次向该进程发送 post 的源进程的 SO 地址 (0x105901aae8) 及其标识信息 (1 6,可能是 pid/ser 或其他)。

last post sent: 0 0 26
last post sent-location: ksa2.h LINE:290 ID:ksasnd
记录该进程最后一次发送给其他进程的“Post”信号。
0 0 26: 包含接收者信息和 post 类型的编码。需内部解析。
ksa2.h LINE:290 ID:ksasnd: 发送发生在 ksasnd (Kernel Service Asynchronous Send) 函数中。

last process posted by me: 0x105901aae8 1 6
该进程最后一次发送 post 的目标进程的 SO 地址 (0x105901aae8) 及其标识信息 (1 6)。注意: 这个地址 (0x105901aae8) 和标识 (1 6) 与 last process to post me 中的完全相同。这表明 pid 307 在死亡前与 pid=1, ser=6 (很可能是 PMON 本身或其他关键后台进程) 有双向的 post 通信。

waiter on post event: 0
该进程在死亡时没有在等待任何 post 事件。这与 last post received 显示它之前确实接收过 post 不矛盾,只是表明在死亡那一刻它没有处于等待状态。

Latch 信息:
(latch info) hold_bits=0x0
hold_bits: 0x0: 表明该进程在死亡时没有持有任何 latch。这是好消息,意味着它的死亡不会直接导致 latch 泄漏或阻塞其他需要这些 latch 的进程。PMON 清理时不需要处理 latch 释放问题。

进程组信息:
Process Group: DEFAULT, pseudo proc: 0x0000001059163D78
Process Group: DEFAULT: 该进程属于默认进程组。
pseudo proc: 0x0000001059163D78: 指向其所属进程组的“伪进程”结构在 SGA 中的地址。

操作系统 (O/S) 信息:
O/S info: user: SYSTEM, term: CJC-SVR, ospid: 34180 (DEAD)
user: SYSTEM: 该进程在操作系统层面是由 SYSTEM 用户(或运行 Oracle 软件的操作系统用户)运行的。
term: CJCTEST-SVR: 终端或客户端信息(可能不准确,特别是对于后台进程)。这里显示 CJCTEST-SVR,可能是服务器主机名或一个标识符。
ospid: 34180 (DEAD): 该进程的操作系统进程 ID (OSPID) 是 34180。(DEAD) 标签明确确认 Oracle 检测到操作系统进程 34180 已经不存在了。这是进程状态

被标记为 DEAD 的直接证据。常见原因包括:
(1)被操作系统 OOM (Out-Of-Memory) Killer 终止。
(2)被 kill -9 命令强制终止。
(3)进程自身崩溃 (Segmentation Fault, Abort 等)。
(4)网络连接中断导致操作系统关闭连接。
(5)操作系统重启或资源限制。

  service name: pdbcjc
  client details:
    O/S info: user: sys_cjc, term: CJC-SVR, ospid: 33496:2
    machine: IIS APPPOOL\CJC-SVR program: w3wp.exe
  Current Wait Stack:
    Not in wait; last wait ended 1 min 1 sec ago 
  Wait State:
    fixed_waits=0 flags=0x21 boundary=0x0000000000000000/-1
  Session Wait History:
      elapsed time of 1 min 1 sec since last wait
   0: waited for 'SQL*Net message from client'
      driver id=0x28444553, #bytes=0x1, =0x0
      wait_id=6 seq_num=7 snap_id=1
      wait times: snap=0.005437 sec, exc=0.005437 sec, total=0.005437 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.000003 sec of elapsed time
   1: waited for 'SQL*Net message to client'
      driver id=0x28444553, #bytes=0x1, =0x0
      wait_id=5 seq_num=6 snap_id=1
      wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.000017 sec of elapsed time
   2: waited for 'SQL*Net message from client'
      driver id=0x28444553, #bytes=0x1, =0x0
      wait_id=4 seq_num=5 snap_id=1
      wait times: snap=0.000481 sec, exc=0.000481 sec, total=0.000481 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.000038 sec of elapsed time
   3: waited for 'SQL*Net message to client'
      driver id=0x28444553, #bytes=0x1, =0x0
      wait_id=3 seq_num=4 snap_id=1
      wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.000036 sec of elapsed time
   4: waited for 'log file sync'
      buffer#=0xca56, sync scn=0xecdc383, =0x0
      wait_id=2 seq_num=3 snap_id=1
      wait times: snap=0.067455 sec, exc=0.067455 sec, total=0.067455 sec
      wait times: max=infinite
      wait counts: calls=1 os=1
      occurred after 0.005086 sec of elapsed time
   5: waited for 'SQL*Net message from client'
      driver id=0x28444553, #bytes=0x1, =0x0
      wait_id=1 seq_num=2 snap_id=1
      wait times: snap=0.045936 sec, exc=0.045936 sec, total=0.045936 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.000016 sec of elapsed time
   6: waited for 'SQL*Net message to client'
      driver id=0x28444553, #bytes=0x1, =0x0
      wait_id=0 seq_num=1 snap_id=1
      wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.000000 sec of elapsed time

服务与客户端信息
service name: pdbcjc
该进程连接的数据库服务名为 pdbcjc(通常是一个PDB级别的服务名),表明连接来自特定应用或租户。

client details:
  O/S info: user: sys_cjc, term: CJC-SVR, ospid: 33496:2
  machine: IIS APPPOOL\CJC-SVR program: w3wp.exe

客户端详细信息:
user: sys_cjc:操作系统用户名为 sys_cjc(可能是应用服务账户)。
term: CJC-SVR:终端/主机名为 CJC-SVR。
ospid: 33496:2:客户端进程的操作系统PID为 33496,2 可能是线程ID或子进程标识。
machine: IIS APPPOOL\CJC-SVR:客户端机器标识为 IIS应用程序池(微软Web服务器),主机名 CJC-SVR。
program: w3wp.exe:客户端程序为 IIS工作进程(用于运行ASP.NET等Web应用),表明这是一个Web应用发起的数据库连接。
关键结论:该Oracle进程由运行在IIS上的Web应用(ASP.NET等)创建,通过服务名 pdbcjc 连接到PDB。

当前等待状态
Current Wait Stack:
Not in wait; last wait ended 1 min 1 sec ago

当前无等待:进程在死亡时未处于任何等待状态。
上次等待结束时间:最后一次等待结束于 1分1秒前,表明进程在死亡前已空闲超过1分钟。

会话等待历史 (Session Wait History)
按时间倒序列出死亡前的最后7个等待事件(0=最近,6=最早):
0. 最近等待:SQLNet message from client (结束于1分钟前)
waited for 'SQL
Net message from client’
driver id=0x28444553, #bytes=0x1, =0x0
wait times: snap=0.005437 sec, total=0.005437 sec
occurred after 0.000003 sec of elapsed time
事件:等待客户端发送请求(SQL*Net message from client)。
驱动:0x28444553 = (DES)(Oracle TNS驱动)。
参数:预期接收1字节数据(心跳包或控制信号),实际收到0字节。
等待时间:5.4毫秒(短暂等待)。
发生时机:会话空闲后极短时间内触发(0.000003秒)。

关键事件:log file sync (事务提交等待)
waited for ‘log file sync’
buffer#=0xca56, sync scn=0xecdc383
wait times: snap=0.067455 sec, total=0.067455 sec
wait counts: calls=1 os=1
occurred after 0.005086 sec of elapsed time

事件:log file sync - 提交事务时等待重做日志写入磁盘。
参数:
buffer#=0xca56:重做日志缓冲区编号(十进制 51798)。
sync scn=0xecdc383:事务提交的SCN(System Change Number)。
等待时间:67.5毫秒(显著高于其他事件)。
系统调用:calls=1 os=1 表明涉及磁盘I/O操作。
影响:这是Web应用执行的最后一次有效操作(事务提交)。

总结:
进程异常死亡DEAD—>PMON清理死进程—>尝试释放INVL游标—>共享池内存损坏kghfrh:ds—>实例自动终止 error 472

参考:

Instance Termination with ORA-07445 [kghsrch()+144], ORA-00600 [kghfrh:ds] (Doc ID 2128933.1)

欢迎关注我的公众号《IT小Chen
![

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值