CRS-4639: Could not contact Oracle High Availability Services

本文介绍了解决Oracle CRS-4639错误的过程,包括使用roothas.pl进行卸载配置,重新配置Oracle Restart堆栈,并启动集群服务。

在启动asm实例的时候报如下错误

[grid@b1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Thu Sep 12 18:14:13 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service

然后用crsctl check css检查的时候报如下错误:

[grid@b1 ~]$ crsctl check css
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Check failed, or completed with errors.

解决CRS-4639: Could not contact Oracle High Availability Services过程如下:

[root@b1 grid]# cd /u01/app/11.2.0/grid/crs/install
[root@b1 install]#  ./roothas.pl -deconfig -force -verbose
2013-09-12 19:25:05: Checking for super user privileges
2013-09-12 19:25:05: User has super user privileges
2013-09-12 19:25:05: Parsing the host name
Using configuration parameter file: ./crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
Failure at scls_scr_getval with code 1
Internal Error Information:
  Category: -2
Operation: opendir
  Location: scrsearch1
  Other: cant open scr home dir scls_scr_getval
  System Dependent Information: 2

CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
ACFS-9200: Supported
Successfully deconfigured Oracle Restart stack


[root@b1 install]# cd /u01/app/11.2.0/grid/
[
root@b1 grid]# ./root.sh

Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /opt/oracrs/product/11gR2/grid
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Stand-Alone Server run the following command as the root user:
/opt/oracrs/product/11gR2/grid/perl/bin/perl -I/opt/oracrs/product/11gR2/grid/perl/lib -I/opt/oracrs/product/11gR2/grid/crs/install /opt/oracrs/product/11gR2/grid/crs/install/roothas.pl


To configure Grid Infrastructure for a Cluster execute the following command:
/opt/oracrs/product/11gR2/grid/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

 
/opt/oracrs/product/11gR2/grid/perl/bin/perl -I/opt/oracrs/product/11gR2/grid/perl/lib -I/opt/oracrs/product/11gR2/grid/crs/install /opt/oracrs/product/11gR2/grid/crs/install/roothas.pl/opt/oracrs/product/11gR2/grid/perl/bin/perl -I/opt/oracrs/product/11gR2/grid/perl/lib -I/opt/oracrs/product/11gR2/grid/crs/install /opt/oracrs/product/11gR2/grid/crs/install/roothas.pl

 

[root@b1 grid]# /opt/oracrs/product/11gR2/grid/perl/bin/perl -I/opt/oracrs/product/11gR2/grid/perl/lib -I/opt/oracrs/product/11gR2/grid/crs/install /opt/oracrs/product/11gR2/grid/crs/install/roothas.pl

Using configuration parameter file: /opt/oracrs/product/11gR2/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
LOCAL ADD MODE
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node linux140 successfully pinned.
Adding Clusterware entries to inittab

linux140     2013/11/12 15:33:48     /opt/oracrs/product/11gR2/grid/cdata/linux140/backup_20131112_153348.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server

[grid@b1 ~]$ crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.cssd       ora.cssd.type  OFFLINE   OFFLINE              
ora.diskmon    ora....on.type OFFLINE   OFFLINE              
[grid@b1 ~]$ crs_start -all
Attempting to start `ora.diskmon` on member `b1`
Attempting to start `ora.cssd` on member `b1`
Start of `ora.diskmon` on member `b1` succeeded.
Start of `ora.cssd` on member `b1` succeeded.
[grid@b1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Thu Sep 12 19:34:50 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2212656 bytes
Variable Size             256552144 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
ASM diskgroups volume enabled

你遇到的错误信息表明 Oracle Grid Infrastructure(或 Oracle RAC 环境)中的 **Cluster Ready Services (CRS)** 虽然部分组件在运行,但关键的集群通信服务无法正常工作。具体来看: ```bash CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager ``` ### 解释: - `CRS-4638`: OHASD(Oracle High Availability Services Daemon)是启动的,说明本地节点的基本高可用服务已运行。 - `CRS-4535`: CRS 代理进程(CRSD)未响应,可能是崩溃、被阻塞或未正确启动。 - `CRS-4530`: CSSD(Cluster Synchronization Services Daemon)通信失败,意味着节点无法与集群同步服务通信 —— 这会导致节点“驱逐”(eviction),是严重问题。 - `CRS-4534`: EVM(Event Manager)无法通信,通常依赖于 CRSD 和 CSSD。 --- ## ✅ 可能原因分析 1. **网络问题**:私有网络(Private Interconnect)故障或配置错误。 2. **OCR/表决磁盘(Voting Disk)不可访问**:存储路径断开、权限问题、ASM 实例异常。 3. **CSSD 进程挂起或崩溃**:常见于资源不足、内核参数设置不当。 4. **时间不同步(NTP/CTSS)**:节点间时间偏差过大导致集群分裂(split-brain)保护机制触发。 5. **防火墙或 SELinux 干扰**:阻止了必要的端口通信。 6. **节点已被驱逐出集群**:因心跳丢失等原因。 7. **Grid Infrastructure 损坏或升级失败**。 --- ## ✅ 排查和解决步骤 ### 步骤 1:检查本地 CRS 组件状态 ```bash crsctl check crs ``` > 输出如上所示,确认问题范围。 --- ### 步骤 2:查看各守护进程状态 ```bash ps -ef | grep cssd ps -ef | grep crsd ps -ef | grep ohasd ``` 你应该看到以下进程存在且属于 `grid` 用户: - `/etc/init.d/init.ohasd run` (OHASD) - `cssdagent`, `cssdmonitor`, `ocssd.bin`(CSSD 相关) - `crsd.bin`(CRSD) 如果没有 `ocssd.bin` 或 `crsd.bin`,则这些服务没有启动。 --- ### 步骤 3:手动启动 CSSD(谨慎操作) 如果发现 `ocssd.bin` 没有运行,尝试手动启动: ```bash sudo -u grid $GRID_HOME/bin/crsctl start res ora.cssd -init ``` > 注意:`-init` 表示这是初始化资源(由 OHASD 管理) 也可以重启整个 OHASD(会影响所有 GI 服务): ```bash # 停止 OHASD crsctl stop has # 启动 OHASD crsctl start has ``` 然后再次运行 `crsctl check crs` 查看是否恢复。 --- ### 步骤 4:检查日志文件 #### 关键日志位置: ```bash $GRID_HOME/log/<hostname>/alert<hostname>.log $GRID_HOME/log/<hostname>/cssd/ocssd.log $GRID_HOME/log/<hostname>/crsd/crsd.log $GRID_HOME/log/<hostname>/ohasd/ohasd.log ``` 例如: ```bash tail -f /u01/app/19.0.0/grid/log/hisrac1/cssd/ocssd.log ``` 搜索关键词:`FATAL`, `reboot`, `misscount`, `timeout`, `communication failure` --- ### 步骤 5:检查私网通信(心跳网络) 确保私有网络接口正常工作,并且可以与其他节点 ping 通。 ```bash # 查看私网配置 oifcfg getif # 示例输出应类似: # eth1 192.168.1.0 private # eth2 10.10.1.0 cluster_interconnect ``` 测试对等节点的私网连通性: ```bash ping <peer_node_private_ip> ``` 检查是否启用了 IP forwarding 或 MTU 不匹配等问题。 --- ### 步骤 6:检查表决磁盘和 OCR 状态 ```bash # 检查表决磁盘状态 crsctl query css votedisk # 检查 OCR 状态 ocrcheck ``` 预期输出应该是“ONLINE”并且能读取成功。若显示“PROT-xx”错误,则表示 OCR 访问有问题。 --- ### 步骤 7:检查时间同步 ```bash date ntpq -p # 如果使用 NTP ``` 或者检查 CTSS 状态: ```bash crsctl check ctss ``` 如果是 `Active`(使用 NTP),没问题;如果是 `Observer`,说明时间同步由 CTSS 管理,需确认是否偏差大。 --- ### 步骤 8:检查防火墙和 SELinux 临时关闭防火墙测试: ```bash systemctl stop firewalld systemctl disable firewalld setenforce 0 ``` 并确保开放以下端口(根据版本略有差异): - UDP/TCP: 12345 (CSS), 2013 (CRS), 9625 (EVM) - ASM: 1521 - Database: 1521, 5500, 5520 等 --- ### 步骤 9:强制重新加入集群(最后手段) 如果本节点已被踢出集群,可尝试重置并重启: ⚠️ 警告:这可能导致短暂服务中断! ```bash # 停止 CRS crsctl stop has -f # 清除当前节点信息(危险!仅用于重建) # 注意不要在主节点执行此命令 $GRID_HOME/crs/install/rootcrs.sh -deconfig -force # 重新配置 $GRID_HOME/crs/config/config.sh ``` 或从集群中删除该节点后再添加。 --- ## ✅ 总结建议 | 项目 | 检查点 | |------|--------| | ✅ 日志 | `ocssd.log`, `crsd.log`, `alert.log` | | ✅ 进程 | `ocssd.bin`, `crsd.bin` 是否运行 | | ✅ 存储 | OCR/Voting Disk 是否可访问(ASM 正常?) | | ✅ 网络 | 私网互通、无丢包、MTU一致 | | ✅ 时间 | NTP/CTSS 正常同步 | | ✅ 防火墙 | 关闭或放行必要端口 | --- ### 示例:修复流程摘要 ```bash # 1. 查看状态 crsctl check crs # 2. 查看进程 ps -ef | grep ocssd # 3. 尝试重启 OHASD crsctl stop has crsctl start has # 4. 查看日志 tail -100f $GRID_HOME/log/hisrac1/cssd/ocssd.log # 5. 查询投票盘 crsctl query css votedisk # 6. 检查 OCR ocrcheck ``` ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值