数据库实例自动crash并报ORA-27157、ORA-27300等错误

最新推荐文章于 2024-08-14 20:32:47 发布

转载最新推荐文章于 2024-08-14 20:32:47 发布 · 6.6k 阅读

Oracle 专栏收录该内容

11 篇文章

订阅专栏

本文解决了一个在RHEL7.2上安装12CRAC数据库后，其中一个实例频繁自动崩溃的问题。错误信息显示与systemd-logind服务移除IPC对象有关。通过调整/etc/systemd/logind.conf中的RemoveIPC参数为no，可以避免此问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

原文地址

rhel7.2上安装12C RAC数据库后，其中一个数据库实例经常会自动crash。查看alert日志发现以下错误信息：

 
        Errors 
        in 
        file 
        /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_j000_21047
        .trc:
       
        ORA-27157: OS post
        /wait 
        facility removed
       
        ORA-27300: OS system dependent operation:semop failed with status: 43
       
        ORA-27301: OS failure message: Identifier removed
       
        ORA-27302: failure occurred at: sskgpwwait1
       
        Fri Sep 09 16:50:53 2016
       
        Errors 
        in 
        file 
        /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_rmv0_20798
        .trc:
       
        ORA-27157: OS post
        /wait 
        facility removed
       
        Fri Sep 09 16:50:53 2016
       
        Errors 
        in 
        file 
        /d12/app/oracle/diag/rdbms/rac12c/rac12c2/trace/rac12c2_q005_21328
        .trc:
       
        ORA-27157: OS post
        /wait 
        facility removed
       
        ORA-27300: OS system dependent operation:semop failed with status: 43
       
        ORA-27301: OS failure message: Identifier removed
       
        ORA-27302: failure occurred at: sskgpwwait1

错误原因描述：

在rhel7.2中，systemd-logind服务引入了一个新特性：在一个user完全退出OS后会remove掉所有的IPC对象。
该特性由/etc/systemd/logind.conf参数文件中RemoveIPC选项来控制。详细请看man logind.conf(5)。

在rhel7.2中，RemoveIPC的默认值是yes

因此，当最后一个oracle或者grid用户退出时，操作系统会remove掉这个user的shared memory segments和semaphores
而Oracle ASM和database的SGA需要使用 shared memory segments，因此remove shared memory segments将会crash掉Oracle ASM和database instances。

请参考Redhat bug 1264533 - https://bugzilla.redhat.com/show_bug.cgi?id=1264533

这个问题会影响使用shared memory segments和semaphores的所有应用，因此，Oracle ASM 实例和Oracle Database 实例均受到影响。
oel7.2为了避免这个问题，在/etc/systemd/logind.conf配置文件中明确设置RemoveIPC为no。

该问题会导致的现象：

 
        1) Installing 11.2 and 12c GI
        /CRS 
        fails, because ASM crashes towards the end of the installation.
       
        2) Upgrading to 11.2 and 12c GI
        /CRS 
        fails.
       
        3) After Redhat Linux is upgraded to 7.2, 11.2 and 12c ASM and database instances crash.

systemd-logind可能会在任何时候remove IPC对象，发生错误的时候对应的日志现象也不同。比如：

 
        Most common error that occurs is that the following is found 
        in 
        the asm or database alert.log:
       
        ORA-27157: OS post
        /wait 
        facility removed
       
        ORA-27300: OS system dependent operation:semop failed with status: 43
       
        ORA-27301: OS failure message: Identifier removed
       
        ORA-27302: failure occurred at: sskgpwwait1

 
        The second observed error occurs during installation and upgrade when asmca fails with the following error:
       
        KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
       
        KFOD-00105: Could not 
        open 
        pfile 
        'init@.ora'

 
        The third observed error occurred during installation and upgrade:
       
        Creation of ASM password 
        file 
        failed. Following error occurred: Error 
        in 
        Process: 
        /d12/app/12
        .1.0
        /grid/bin/orapwd
       
        Enter password 
        for 
        SYS:
       
        OPW-00009: Could not establish connection to Automatic Storage Management instance
       
        2015
        /11/20 
        21:38:45 CLSRSC-184: Configuration of ASM failed
       
        2015
        /11/20 
        21:38:46 CLSRSC-258: Failed to configure and start ASM

 
   
        The fourth observed error is the following message is found 
        in 
        the 
        /var/log/messages 
        file 
        around the 
        time 
        that asm or database instance crashed:
       
 
        Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] 
        trap 
        divide error
       
 
        ip:3896db8 sp:7ffef1de3c40 error:0 
        in 
        oracle[400000+ef57000]
       
 
 

修改方法：

1).设置/etc/systemd/logind.conf中RemoveIPC=no
2).重启服务器或者重启systemd-logind
重启systemd-logind：

 
        # systemctl daemon-reload
       
        # systemctl restart systemd-logind

MOS Doc:

ALERT: Setting RemoveIPC=yes on Redhat 7.2 Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)