记一次Oracle RAC一节点重启后出现故障的处理

最新推荐文章于 2025-06-24 13:28:05 发布

范一刀

最新推荐文章于 2025-06-24 13:28:05 发布

阅读量6.1k

点赞数

CC 4.0 BY-SA版权

分类专栏：故障处理文章标签： RAC节点启动失败

本文链接：https://blog.youkuaiyun.com/mfanoffice2012/article/details/80285148

故障处理专栏收录该内容

27 篇文章

订阅专栏

本文讲述了客户在重启RAC节点后遇到的故障，通过排查发现系开发人员误改oracle目录属主导致服务无法写入文件。修复过程包括识别问题、更改文件权限并恢复正常。关键教训在于文件权限管理在RAC环境中的重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

因为存储的相关操作，客户需要手动重启rac节点，然而，这个重启导致了接下来的事故。。。。

由于是远程跟我沟通，我回复rac环境下可以重启一个节点，客户就自信重启了，出现的故障如下所示：

[grid@hxdb01 ~]$  srvctl start nodeapps -n hxdb01
PRKH-1010 : 无法与 CRS 服务通信。
PRKH-3003 : 尝试与 CSS 守护程序通信时失败


[grid@hxdb01 ~]$ crsctl start cluster

CRS-2672: Attempting to start 'ora.gpnpd' on 'hxdb01'
CRS-5017: The resource action "ora.gpnpd start" encountered the following error: 
Start action for daemon aborted. For details refer to "(:CLSN00107:)" in "/u01/app/grid/11.2/log/hxdb01/agent/ohasd/oraagent_grid//oraagent_grid.log".
CRS-2674: Start of 'ora.gpnpd' on 'hxdb01' failed
CRS-2679: Attempting to clean 'ora.gpnpd' on 'hxdb01'
CRS-2681: Clean of 'ora.gpnpd' on 'hxdb01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'hxdb01'
CRS-5017: The resource action "ora.gpnpd start" encountered the following error: 
Start action for daemon aborted. For details refer to "(:CLSN00107:)" in "/u01/app/grid/11.2/log/hxdb01/agent/ohasd/oraagent_grid//oraagent_grid.log".
CRS-2674: Start of 'ora.gpnpd' on 'hxdb01' failed
CRS-2679: Attempting to clean 'ora.gpnpd' on 'hxdb01'
CRS-2681: Clean of 'ora.gpnpd' on 'hxdb01' succeeded
CRS-4000: Command Start failed, or completed with errors.

[grid@hxdb01 ~]$  crsctl query crs activeversion
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service

一开始发给我这样的报错以为是rac抉择盘或 ASM磁盘出现故障导致不能加入rac集群，客户那边DBA通过各种查资料及搜索问题，无果，因为这套rac环境是我做的，公司就派我到现场解决；

来到客户现场，检查过基础环境确认没问题，开始查报错，看日志，网络上相关的报错解决非常详细，但经过操作之后完全没用，无奈，只能再次从日志寻找疑点。

在gpnpd.log日志中：有一条报错引起了注意：/u01/app/grid/11.2/gpnp/init/hxdb01.pid 写入失败，如下图所示：
这里写图片描述
既然是不能写，第一时间想到文件权限，ll一下这个文件，发现hxdb01.pid属主为root,并且发现整个目录的属主都是root, 这绝对是认为修改的，于是想到了上次开发人员在上线导数据的时候误改了oracle整个目录的属主为root，虽然后来还原了，但有关rac服务的部分目录没有改回来，直到现在是第一次重启节点，导致rac相关服务不能对文件作写操作，以至于不能启动集群服务。

问题找到了,解决就简单多了：

在 /u01/app/grid/11.2/gpnp/目录下，
将 hxdb01 、 init 、  profiles 、 wallets 四个目录的属主由“root” 改为 “grid” ，
重启服务器后 rac恢复正常。