How to Troubleshoot Grid Infrastructure Startup Issues

本文提供了解决 Oracle 11gR2 Grid Infrastructure 中集群软件启动问题的方法,包括不同环境下(如 root.sh 或 root 升级过程中及现有不健康环境中)的故障排除步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]

Modified 25-JUN-2010Type HOWTOStatus PUBLISHED

In this Document
Goal
Solution
Start up sequence:
Cluster status
Case 1: OHASD.BIN does not start
Case 2: OHASD Agents does not start
Case 3: CSSD.BIN does not start
Case 4: CRSD.BIN does not start
Case 5: GPNPD.BIN does not start
Case 6: Various other daemons does not start
Case 7: CRSD Agents does not start
Network and Naming Resolution Verification
Log File Location, Ownership and Permission
Network Socket File Location, Ownership and Permission
Diagnostic file collection
References


Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later[Release: 11.2 and later ]
Information in this document applies to any platform.

Goal

This goal of the note is to provide reference to troubleshoot 11gR2 Grid Infrastructure clusterware startup issues. It applies to issues in both new environments (during root.sh or rootupgrade.sh) and unhealthy existing environments. To look specifically at root.sh issues, see Note: 1053970.1 for more information.

Solution

Start up sequence:

In a nutshell, the operating system starts ohasd, ohasd starts agents to start up daemons (gipcd, mdnsd, gpnpd, ctssd, ocssd, crsd, evmd asm etc), and crsd starts agents that start user resources (database, SCAN, listener etc).

For detailed Grid Infrastructure clusterware startup sequence, please refer to note 1053147.1

Cluster status


To find out cluster and daemon status:

$GRID_HOME/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

$GRID_HOME/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started
ora.crsd
1 ONLINE ONLINE rac1
ora.cssd
1 ONLINE ONLINE rac1
ora.cssdmonitor
1 ONLINE ONLINE rac1
ora.ctssd
1 ONLINE ONLINE rac1 OBSERVER
ora.diskmon
1 ONLINE ONLINE rac1
ora.drivers.acfs
1 ONLINE ONLINE rac1
ora.evmd
1 ONLINE ONLINE rac1
ora.gipcd
1 ONLINE ONLINE rac1
ora.gpnpd
1 ONLINE ONLINE rac1
ora.mdnsd
1 ONLINE ONLINE rac1


Case 1: OHASD.BIN does not start


As ohasd.bin is responsible to start up all other cluserwareprocesses directly or indirectly, it needs tostart up properly for the rest of the stack to come up.

Automatic ohasd.bin start up depends on the following:

1. OS is at appropriate run level:

OS need to be at specified run level before CRS will try to start up.

To find out at which run levelthe clusterwareneeds to come up:

cat /etc/inittab|grep init.ohasd
h1: 35 :respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null



Above example shows CRS suppose to run at run level 3 and 5; please note depend on platform, CRS comes up at different run level.

To find out current run level:

who -r



2. "init.ohasd run" is up

On Linux/UNIX, as "init.ohasd run" is configured in /etc/inittab, process init (pid 1, /sbin/init on Linux, Solaris and hp-ux, /usr/sbin/init on AIX) will start and respawn "init.ohasd run" if it fails. Without "init.ohasd run" up and running, ohasd.bin will not start:

ps -ef|grep init.ohasd|grep -v grep
root 2279 1 0 18:14 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run



3. Cluserware auto start is enabled - its enabled by default

By default CRS is enabled for auto start upon node reboot, to enable:

$GRID_HOME/bin/crsctl enable crs


To verify whether its currently enabled or not:

cat $SCRBASE/$HOSTNAME/root/ohasdstr
enable


SCRBASE is /etc/oracle/scls_scr on Linux and AIX, /var/opt/oracle/scls_scr on hp-ux and Solaris

Note: NEVER EDIT THE FILE MANUALLY, use "crsctl enable/disable crs" command instead.

4. syslogd is up and OS is able to execute init script S96ohasd

OS may stuck with some other Snn script while node is coming up, thus never get chance to execute S96ohasd; if that's the case, following message will not be in OS messages:

Jan 20 20:46:51 rac1 logger: Oracle HA daemon is enabled for autostart.


If you don't see above message, the other possibility is syslogd(/usr/sbin/syslogd) is not fully up. Grid may fail to come up in that case as well. This may not apply to AIX.

To find out whether OS is able to execute S96ohasd while node is coming up, modify ohasd:

From:

case `$CAT $AUTOSTARTFILE` in
enable*)
$LOGERR "Oracle HA daemon is enabled for autostart."


To:

case `$CAT $AUTOSTARTFILE` in
enable*)
/bin/touch /tmp/ohasd.start."`date`"
$LOGERR "Oracle HA daemon is enabled for autostart."


After a node reboot, if you don't see /tmp/ohasd.start.timestamp get created, it means OS stuck with some other Snn script. If you do see /tmp/ohasd.start.timestamp but not "Oracle HA daemon is enabled for autostart" in messages, likely syslogd is not fully up. For both case, you will need engage System Administrator to find out the issue on OS level. For latter case, the workaround is to "sleep" for about 2 minutes, modify ohasd:

From:

case `$CAT $AUTOSTARTFILE` in
enable*)
$LOGERR "Oracle HA daemon is enabled for autostart."


To:

case `$CAT $AUTOSTARTFILE` in
enable*)
/bin/sleep 120
$LOGERR "Oracle HA daemon is enabled for autostart."
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值