Oracle RAC CRS-0184 --Cannot communicate with the CRS daemon

Oracle 11gR2 下RAC 安装后,启动CRS. 错误如下:

 

[root@rac1 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

 

从这个错误提示,可以看到是CRS启动失败了。 CRS是关键进程。 它不能启动,Clusterware 也是启动不了。 导致这个问题的原因很多。

 

Log 如下:

[root@rac1 rac1]# tail -50 /u01/app/11.2.0/grid/log/rac1/crsd/crsd.log

ORA-15077: could not locate ASM instance serving a required diskgroup

 

2010-11-16 17:13:44.286: [  OCRASM][3046411024]proprasmo: kgfoCheckMount returned [7]

2010-11-16 17:13:44.286: [  OCRASM][3046411024]proprasmo: The ASM instance is down

2010-11-16 17:13:44.287: [  OCRRAW][3046411024]proprioo: Failed to open [+CRS]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2010-11-16 17:13:44.287: [  OCRRAW][3046411024]proprioo: No OCR/OLR devices are usable

2010-11-16 17:13:44.287: [  OCRASM][3046411024]proprasmcl: asmhandle is NULL

2010-11-16 17:13:44.287: [  OCRRAW][3046411024]proprinit: Could not open raw device

2010-11-16 17:13:44.287: [  OCRASM][3046411024]proprasmcl: asmhandle is NULL

2010-11-16 17:13:44.287: [  OCRAPI][3046411024]a_init:16!: Backend init unsuccessful : [26]

2010-11-16 17:13:44.288: [  CRSOCR][3046411024] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

] [7]

2010-11-16 17:13:44.288: [    CRSD][3046411024][PANIC] CRSD exiting: Could not init OCR, code: 26

2010-11-16 17:13:44.288: [    CRSD][3046411024] Done.

 

 

       这里的提示是ASM 没有启动造成的。 这里牵涉到的问题较复杂。

 

       这篇文章不打算去具体分析这个问题。 Oracle 官网上有一篇文章对这个问题进行了非常详细的说明。转到了我的Blog。 参考:

       How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]

       http://blog.youkuaiyun.com/tianlesoftware/archive/2010/11/17/6013763.aspx

 

In this Document

  Goal

  Solution

     Start up sequence:

     Cluster status

     Case 1: OHASD.BIN does not start

     Case 2: OHASD Agents does not start

     Case 3: CSSD.BIN does not start

     Case 4: CRSD.BIN does not start

     Case 5: GPNPD.BIN does not start

     Case 6: Various other daemons does not start

     Case 7: CRSD Agents does not start

     Network and Naming Resolution Verification

     Log File Location, Ownership and Permission

     Network Socket File Location, Ownership and Permission

     Diagnostic file collection

  References

 

 

在这里写下我分析问题的思路:

 

1. 根据log,看能否找到问题的原因。 如果不能清楚的定位问题。 就只能继续分析。

 

2. 根据CRS 启动的顺序来分析。

       在启动的时候,要先启动ASM 实例, 这里牵涉到存储问题。

       (1)网络是否正常

       (2)存储是否正常的映射到相关的位置, 我的实验采用的是multipath,将存储映射到/dev/mapper/* 目录下。 在遇到问题的时候,会去检查这个问题是否有相关的映射。

       (3)存储的权限问题。 因为映射之后,默认是的root用户。 我在/etc/rc.d/rc.local 文件里添加了改变权限的脚本。 开机启动的时候,就将相关映射文件改成Oracle 用户。

 

3.  如果这些都正常,没有问题, 可以尝试重启CRS 或者重启操作系统。

 

 

补充:

 

       在网上还搜索到一个导致CSSD启动失败的原因。 这个我关注的是,它讲到了一个知识点。 讲到了 /tmp/.oracle  /var/tmp/.oracle 这两个目录的作用。 每次Server重启的时候,会在这两个文件里存放锁的信息。 当某次重启后,这两个文件不能被删除,就会导致锁不能更新,从而不能启动。

 

由此也理解了,在删除Clusterware的时候,为什么需要删除这2个目录了。

 

在RAC 删除的那篇文档里提到了卸载RAC时要删除这2个目录。 参考:

       RAC 卸载 说明

       http://blog.youkuaiyun.com/tianlesoftware/archive/2010/09/18/5892225.aspx

 

crs.log 日志内容:

2007-04-11 14:37:34.020: [ COMMCRS][1693]clsc_connect: (100f78610) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_drdb1_crs))

2007-04-11 14:37:34.020: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

2007-04-11 14:37:34.021: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2007-04-11 14:37:35.740: [ COMMCRS][1695]clsc_connect: (100f78610) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_drdb1_crs))

2007-04-11 14:37:35.740: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

When we checked ocssd.log it contained the following

CSSD]2007-04-11 12:53:56.211 [6] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/rdsk/c5t8d0s5)
[ CSSD]2007-04-11 12:53:56.211 [10] >TRACE: clssnmvKillBlockThread: spawned for disk 1 (/dev/rdsk/c5t9d0s5) initial sleep interval (1000)ms
[ CSSD]2007-04-11 12:53:56.211 [11] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/rdsk/c5t8d0s5) initial sleep interval (1000)ms
[ CSSD]2007-04-11 12:53:56.228 [1] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2007-04-11 12:53:56.269 [13] >TRACE: clssnmconnect: connecting to node 1, flags 0×0001, connector 1
[ CSSD]2007-04-11 12:53:56.274 [13] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=drdb1-priv)(PORT=49895))

[ CSSD]2007-04-11 12:53:56.274 [13] >TRACE: clssnmconnect: connecting to node 0, flags 0×0000, connector 1
[ CSSD]2007-04-11 12:53:56.279 [14] >TRACE: clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))

[ CSSD]2007-04-11 12:53:56.279 [14] >ERROR: clssgmclientlsnr: listening failed for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1)) (3)
[ CSSD]2007-04-11 12:53:56.279 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2007-04-11 12:53:56.279 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_drdb1_crs))
[ CSSD]2007-04-11 13:07:36.516 >USER: Oracle Database 10g CSS Release 10.2.0.2.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ clsdmt]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=drdb1DBG_CSSD))
[ CSSD]2007-04-11 13:07:36.516 >USER: CSS daemon log for node drdb1, number 1, in cluster crs
[ clsdmt]Terminating clsdm listening thread
[ CSSD]2007-04-11 13:07:36.536 [1] >TRACE: clssscmain: local-only set to false
[ CSSD]2007-04-11 13:07:36.545 [1] >TRACE: clssnmReadNodeInfo: added node 1 (drdb1) to cluster
[ CSSD]2007-04-11 13:07:36.588 [5] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[ CSSD]2007-04-11 13:07:36.588 [1] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor

 

 

解决方法:

 

       By checking the above logs we have realised the listener of CSS deamon was unable to start.

       the reason why it was unable to start was that each time server reboots it creates a socket at /tmp/.oracle or /var/tmp/.oracle directory .

 

       Also if there are previously existing sockets they cannot be reused or deleted automatically from this directory .oracle.

 

       Therefore the solution to above problem was obtained by deleting all the files inside .oracle directoery in /var/tmp or /tmp.

 

Hence the crs started and cluster came up.



2



 解决:CRS-0184: Cannot communicate with the CRS daemon.
2013-05-09 08:59:09
标签: CRS-0184  oracle
原创作品,允许转载,转载时请务必以超链接形式标明文章  原始出处 、作者信息和本声明。否则将追究法律责任。 http://summervast.blog.51cto.com/690507/1196143
 早上过来,启动rac ,节点1出现了CRS-0184的错误;
 [grid@node1 ~]$ crs_stat -t -v
CRS-0184: Cannot communicate with the CRS daemon.
而节点2都是正常的




网上找到一个比较简单的方法在/tmp/和/var/tmp 下面有个.oracle 的目录,删除掉或许能解决问题;

登录节点1 ,在  /tmp 和/var/tmp 目录下面发现了.oracle 目录
分别在各自的目录中创建了一个backup 目录,把.oracle 移动到backup目录下面,重启节点1

解决:

mv  /var/log/.oracle  /var/log/.oracle.bak

重启后正常了



我采用的第二种方法,果然很管用。
<think>好的,用户需要解决Oracle 11g RAC中的CRS-4534错误。首先,我得回忆一下这个错误的具体含义。CRS-4534通常表示集群资源无法启动或通信问题,可能涉及CRS(Cluster Ready Services)的状态异常。 根据引用内容,用户之前遇到CRS-4535和CRS-2800的问题,可能和资源状态或磁盘丢失有关。CRS-4534可能类似,需要检查CRS服务是否正常启动。首先应该检查各节点的CRS服务状态,使用crsctl check crs命令。如果某个节点的CRS服务没有在线,可能需要手动启动。 然后,查看集群日志,如$CRS_HOME/log/<hostname>/alert<hostname>.log,寻找具体错误信息。可能的问题包括网络配置错误、磁盘资源不可用或时间同步问题。引用[3]提到时间同步问题可能导致CRS错误,需要检查CTSS或NTP配置,确保所有节点时间一致。 另外,资源状态异常如ora.asm可能在INTERMEDIATE状态,可能需要清理资源状态或重新配置。引用[2]中的CRS-2800错误处理可能需要停止并重新启动资源。此外,检查OCR和表决磁盘的可用性,使用ocrcheck和crsctl query css votedisk命令确认它们的状态。 如果问题依旧,可能需要重新初始化CRS服务或应用补丁。最后,确保所有步骤都参考了Oracle官方文档,并在生产环境中谨慎操作,必要时联系Oracle支持。</think>### Oracle 11g RAC CRS-4534 错误解决方法 CRS-4534 错误表示 *Cluster Ready Services (CRS)* 无法在集群节点上正常启动或通信中断。以下是分步解决方案: --- #### **1. 检查CRS服务状态** 在所有节点执行以下命令,确认服务是否正常: ```bash crsctl check crs ``` - **若输出包含 `CRS-4534`**:表明当前节点CRS服务未完全在线,需进一步排查[^1]。 - **若其他节点正常**:仅故障节点需处理。 --- #### **2. 检查集群日志** 查看CRS日志定位具体原因: ```bash $CRS_HOME/log/<hostname>/alert<hostname>.log # 主日志 $CRS_HOME/log/<hostname>/crsd/crsd.log # CRS守护进程日志 ``` 常见日志线索: - **网络问题**:节点间心跳超时、私网IP配置错误。 - **磁盘资源不可用**:OCR/Voting Disk路径丢失或权限问题(参考引用[2]中磁盘恢复操作)。 - **时间不同步**:若涉及时间同步错误,需检查NTP/CTSS配置(参考引用[3]中ntp.conf处理)。 --- #### **3. 重启CRS服务** 在故障节点尝试重启CRS: ```bash crsctl stop crs crsctl start crs ``` 若重启失败,检查资源占用: ```bash ps -ef | grep crs | grep -v grep # 确认无残留进程 ``` --- #### **4. 检查OCR和Voting Disk** 使用以下命令验证集群关键资源: ```bash ocrcheck # 检查OCR完整性 crsctl query css votedisk # 检查表决磁盘状态 ``` - **若OCR/Voting Disk异常**:需通过 `crsctl replace votedisk` 或 `ocrconfig` 修复[^2]。 --- #### **5. 处理资源状态冲突** 若资源(如 `ora.asm`)处于异常状态: ```bash crsctl stat res -t # 查看资源状态 crsctl stop resource ora.asm -f # 强制停止冲突资源 crsctl start resource ora.asm # 重新启动 ``` --- #### **6. 其他操作** - **网络验证**:使用 `ping`/`cluvfy comp nodecon -n all` 检查私网通信。 - **补丁应用**:确认已安装最新PSU补丁(部分CRS问题需补丁修复)。 --- ### 关键操作总结 | 步骤 | 命令/操作 | 目的 | |------|-----------|------| | 状态检查 | `crsctl check crs` | 确认服务在线情况 | | 日志分析 | `alert<hostname>.log` | 定位错误根源 | | 资源修复 | `crsctl stop/start resource` | 清理冲突状态 | | 磁盘检查 | `ocrcheck`/`crsctl query css votedisk` | 确保存储可靠性 |
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值