💫《博主介绍》:✨又是一天没白过,我是奈斯,从事IT领域✨
💫《擅长领域》:✌️擅长阿里云AnalyticDB for MySQL(分布式数据仓库)、Oracle、MySQL、Linux、prometheus监控;并对SQLserver、NoSQL(MongoDB)有了解✌️
💖💖💖大佬们都喜欢静静的看文章,并且也会默默的点赞收藏加关注💖💖💖
博主一边喝茶一边听歌,突然 Prometheus 告警:数据库宕机了!实例挂了,业务全瘫,一级故障啊😭!立马摘下耳机,开始加紧排查问题。当你阅读到此篇文章,那么想必也是遇到了同样的问题,别慌,问题已经解决,过程分享给大家,希望能帮到各位大佬!
故障现象:
Oracle版本:11.2.0.4
架构:单节点,非RAC架构
系统:Linux-x86 64位
排查流程:
此报错直接导致数据库实例给宕机了,那么首先第一时间先分析alter日志,切换到oracle系统用户后通过echo $ORACLE_BASE确定数据库的安装路径,详细的alter日志路径为:$ORACLE_BASE/diag/rdbms/实例名/trace/alert_实例名.log,然后查看最后的几百条日志,日志中关键的报错信息如下:
Tue Feb 18 14:43:01 2025 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x184E447, opiaba()+639] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/trace/jyracdb_ora_9416.trc (incident=332273): ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x184E447] [SI_KERNEL(general_protection)] [] Incident details in: /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/incident/incdir_332273/jyracdb_ora_9416_i332273.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Tue Feb 18 14:43:09 2025 Dumping diagnostic data in directory=[cdmp_20250218144309], requested by (instance=1, osid=9416), summary=[incident=332273]. Tue Feb 18 14:43:12 2025 Sweep [inc][332273]: completed Sweep [inc2][332273]: completed Tue Feb 18 14:43:51 2025 Errors in file /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/trace/jyracdb_pmon_8713.trc (incident=331217): ORA-00600: internal error code, arguments: [17147], [0x263C0EEAB0], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/incident/incdir_331217/jyracdb_pmon_8713_i331217.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/trace/jyracdb_pmon_8713.trc: ORA-00600: internal error code, arguments: [17147], [0x263C0EEAB0], [], [], [], [], [], [], [], [], [], [] PMON (ospid: 8713): terminating the instance due to error 472 Tue Feb 18 14:43:52 2025 System state dump requested by (instance=1, osid=8713 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/trace/jyracdb_diag_8782_20250218144352.trc Instance terminated by PMON, pid = 8713
故障原因分析:
在日志中主要的错误是ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x184E447] [SI_KERNEL(general_protection)] [],并根据错误输出的trc文件找出了对应执行的SQL语句:
[oracle@lipei ~]$ more /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/incident/incdir_332273/jyracdb_ora_9416_i332273.trc Dump file /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/incident/incdir_332273/jyracdb_ora_9416_i332273.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1 System name: Linux Node name: lipei Release: 3.10.0-1160.el7.x86_64 Version: #1 SMP Mon Oct 19 16:18:59 UTC 2020 Machine: x86_64 Instance name: jyracdb Redo thread mounted by this instance: 1 Oracle process number: 134 Unix process pid: 9416, image: oracle@lipei *** 2025-02-18 14:43:01.551 *** SESSION ID:(1323.1) 2025-02-18 14:43:01.551 *** CLIENT ID:() 2025-02-18 14:43:01.551 *** SERVICE NAME:(jyracdb) 2025-02-18 14:43:01.551 *** MODULE NAME:(JDBC Thin Client) 2025-02-18 14:43:01.551 *** ACTION NAME:() 2025-02-18 14:43:01.551 Dump continued from file: /u01/app/oracle/diag/rdbms/jyracdb27/jyracdb/trace/jyracdb_ora_9416.trc ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x184E447] [SI_KERNEL(general_protection)] [] ========= Dump for incident 332273 (ORA 7445 [opiaba()+639]) ======== ----- Beginning of Customized Incident Dump(s) ----- Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x184E447, opiaba()+639] [flags: 0x0, count: 1] Registers: %rax: 0x000000265f57f620 %rbx: 0x000000263de25f78 %rcx: 0x0000000000003200 %rdx: 0x0000000000000000 %rdi: 0x0000000000004080 %rsi: 0x0000000000004080 %rsp: 0x00007ffc6c490eb0 %rbp: 0x00007ffc6c490f20 %r8: 0x000000263c0ce8a8 %r9: 0x0000000000000031 %r10: 0x0000000000000b32 %r11: 0x000000265f57ee10 %r12: 0x000000265f51be70 %r13: 0x0000000000000000 %r14: 0x02000000265f57f6 %r15: 0x0000000000000005 %rip: 0x000000000184e447 %efl: 0x0000000000010206 opiaba()+625 (0x184e439) jmp 0x184e476 opiaba()+627 (0x184e43b) movzbl 0x2c(%rbx),%r15d opiaba()+632 (0x184e440) mov %r14,-0x60(%rbp) opiaba()+636 (0x184e444) mov %rax,%r14 > opiaba()+639 (0x184e447) movswq 0xa(%r14),%rcx opiaba()+644 (0x184e44c) cmp %ecx,%r15d opiaba()+647 (0x184e44f) jne 0x184e46a opiaba()+649 (0x184e451) mov 0x18(%rbx),%rdi opiaba()+653 (0x184e455) lea 0xc(%r14),%rsi *** 2025-02-18 14:43:01.552 dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=gxawvm3g5askm) ----- begin insert into PM_LOCAL_RECORD_ZLIC01 (ID,PRICE_AGENCY_SOURCE_ID ,PRICE_AGENCY_SOURCE_CODE,BRAND_ID, BRAND_CODE,ORIGINAL_PARTS,PRICE,PRICE_UPDATE_TIME,PRICE_UPDATE_USER,STATUS,PRICE_UPDATE_MODE ,REMARK,CREATE_USER,CREATE_TIME,DEL_FLAG,PRICE_AGENCY_SOURCE_NAME,BRAND_NAME,ORIGINAL_PARTS_NAME, GROUP_NAME,ORIGINAL_PARTS_CODE) values ( PM_SEQUENCE.NEXTVAL, :1 , :2 ,:3 , :4 , :5 , :6 , :7 , :8 ,:9 , :10 , :11 , :12 , :13 , :14 ,:15 , :16 ,:17 , :18 ,:19 ) ; insert into PM_LOCAL_RECORD_ZLIC01 (ID,PRICE_AGENCY_SOURCE_ID ,PRICE_AGENCY_SOURCE_CODE,BRAND_ID, BRAND_CODE,ORIGINAL_PARTS,PRICE,PRICE_UPDATE_TIME,PRICE_UPDATE_USER,STATUS,PRICE_UPDATE_MODE ,REMARK,CREATE_USER,CREATE_TIME,DEL_FLAG,PRICE_AGENCY_SOURCE_NAME,BRAND_NAME,ORIGINAL_PARTS_NAME, GROUP_NAME,ORIGINAL_PARTS_CODE) values ( PM_SEQUENCE.NEXTVAL, :20 , :21 ,:22 , :23 , :24 , :25 , :26 , :27 ,:28 , :29 , :30 , :31 , :32 , :33 ,:34 , :35 ,:36 , :37 ,:38 ) ; insert into PM_LOCAL_RECORD_ZLIC01 (ID,PRICE_AGENCY_SOURCE_ID ,PRICE_AGENCY_SOURCE_CODE,BRAND_ID, BRAND_CODE,ORIGINAL_PARTS,PRICE,PRICE_UPDATE_TIME,PRICE_UPDATE_USER,STATUS,PRICE_UPDATE_MODE ,REMARK,CREATE_USER,CREATE_TIME,DEL_FLAG,PRICE_AGENCY_SOURCE_NAME,BRAND_NAME,ORIGINAL_PARTS_NAME, GROUP_NAME,ORIGINAL_PARTS_CODE) values ( PM_SEQUENCE.NEXTVAL, :39 , :40 ,:41 , :42 , :43 , :44 , :45 , :46 ,:47 , :48 , :49 , :50 , :51 , :52 ,:53 , :54 ,:55 , :56 ,:57 ) ; .....................................
那么到现在为止已经清楚是gxawvm3g5askm这个PL/SQL语句块中嵌套了多个insert语句导致了ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x184E447] [SI_KERNEL(general_protection)] []这个报错。那么通过此报错在mos上进行搜索,排查到此问题是触发了一个从10.2.0.5到12.1.0.1版本中的一个bug,当前的数据库版本为11.2.0.4,好巧不巧在这个bug范围内😨
Oracle官网对此报错的原文如下:mos文档 ID 12578873.8
Bug 12578873 ORA-7445 [opiaba] when using more than 65535 bind variables
This note gives a brief overview of bug 12578873.
The content was last updated on: 08-MAR-2022
Click here for details of each of the sections below.
Affects:
Product (Component)
PL/SQL (Plsql)
Range of versions believed to be affected
Versions BELOW 12.1.0.2
Versions confirmed as being affected
Platforms affected
Generic (all / most platforms affected)
Fixed:
The fix for 12578873 is first included in
Description
This bug is only relevant when using PL/SQL
When using more than 65535 bind variables from PL/SQL we could
get a crash under opiaba(). This issue can also cause a memory corruption
and a subsequent ORA-600 and/or instance failure.
Workaround
Reduce the number of bind variables
Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support.
References
Bug:12578873 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article
解决办法:
方式一:应用补丁12578873,避免实例崩溃。关于补丁,优快云也有其他博主提供,小伙伴可以自行在优快云上寻找,或者私信我获取哦。
方式二:减少绑定变量个数,将同一条SQL/PLSQL的绑定变量个数控制在65535以内。
小提示💥:
上面两种方式都可以解决此问题,方式二只需要开发优化SQL逻辑减少绑定变量个数即可,DBA不需要干预。但为了避免此类问题在人为不知情的情况下再此触发(因为此次问题就是在不知情的情况下触发的bug,导致的宕机),需要停机打上相应的补丁包永久解决。
安装 patch 12578873补丁教程:
这里以在Linux-x86 64位的补丁为例,本地先解压一下补丁,然后进入到12578873目录后,就可以看到README.txt
第一部分:先决条件,在安装或卸载补丁之前,需要确保满足何种要求
第二部分:安装补丁,这里就是安装补丁的详细步骤
第三部分:卸载补丁,这里就是卸载补丁的详细步骤。当然只有安装补丁后后问题的话才需要这步,一般情况下安装好补丁就没有没有问题了
第四部分:描述,描述了此补丁修复的错误bug
完结,撒花🌹。以上就是关于ORA-00600:ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x184E447] [SI_KERNEL(general_protection)] []报错的全部内容了,希望对有遇到此问题的小伙伴有借鉴意义。