
In this Document
Purpose |
Troubleshooting Steps |
Advanced Root.sh Troubleshooting |
Community Discussions |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 to 11.2.0.3 [Release 11.2]Information in this document applies to any platform.
PURPOSE
This document is to provide a reference for troubleshooting root.sh issues after installing an 11.2 Grid Infrastructure home for a cluster. For versions prior to 11.2, seeNote: 240001.1
TROUBLESHOOTING STEPS
At the end of a grid infrastructure installation, the user is prompted to run the "root.sh" script. This script configures and starts the Oracle Clusterware stack. A root.sh script can error out and/or fail under one of the following conditions:
- Problem with the network configuration.
- Problem with the storage location for the OCR and/or voting files.
- Permission problem with/var/tmp (specifically /var/tmp/.oracle).
- Problem with the vendor clusterware (if used).
- Some other configuration issue.
- An Oracle bug.
Most configuration issues should be detectable by running the Cluster Verification Utility with the following syntax (input the nodelist):
./cluvfy stage -pre crsinst -n <nodelist> -r 11gR2 -verbose
Additional options can be used for a more thorough check:
cluvfy stage -pre crsinst -n <node_list> [-r {10gR1|10gR2|11gR1|11gR2}]
[-c <ocr_location_list>] [-q <voting_disk_list>]
[-osdba <osdba_group>]
[-orainv <orainventory_group>]
[-asm -asmgrp <asmadmin_group>]
[-asm -asmdev <asm_device_list>]
[-fixup [-fixupdir <fixup_dir>]] [-verbose]
If the Cluster Verification Utility is unable to find a configuration problem and your root.sh still fails, you may need the assistance of Oracle Support to troubleshoot further and/or see the "Advanced Root.sh Troubleshooting" section:
Advanced Root.sh Troubleshooting
The root.sh is simply a parent script that calls the following scripts:
<GRID_HOME>/install/utl/rootmacro.sh # small - validates home and user
<GRID_HOME>/install/utl/rootinstall.sh # small - creates some local files
<GRID_HOME>/network/install/sqlnet/setowner.sh # small - opens up /tmp permissions
<GRID_HOME>/rdbms/install/rootadd_rdbms.sh # small - misc file/permission checks
<GRID_HOME>/rdbms/install/rootadd_filemap.sh # small - misc file/permission checks
<GRID_HOME>/crs/install/rootcrs.pl # MAIN CLUSTERWARE CONFIG SCRIPT
If your root.sh is failing on one of the first 5 scripts, it should be an easy fix since those fix are small and easy to troubleshoot. However, most problems are likely going to happen in the rootcrs.pl script which is the main clusterware config script. This script will log useful trace data to <GRID_HOME>/cfgtoollogs/crsconfig/rootcrs_<nodename>.log. However, you should check the clusterware alert log under <GRID_HOME>/log/<nodename> first for any obvious problems or errors.
In the following section I will show the log output of a new installation on a 2 node cluster (racbde1 and racbde2) where the OCR and Voting files will be stored in ASM on a diskgroup called +SYSTEMDG. This log information is posted for reference. It might be useful to compare the clusterware alert log from a working root.sh (mine) to a failing one (yours) to see where it went wrong. I will bold the major landmarks in the clusterware alert log so that you can see how far you got:
Node 1 (racbde1) Clusterware Alert Log During root.sh:
2009-12-23 19:24:33.844
[client(17368)]CRS-2106:The OLR location /u01/app/grid/cdata/racbde1.olr is inaccessible. Details in /u01/app/grid/log/racbde1/client/ocrconfig_17368.log.
2009-12-23 19:24:33.956
[client(17368)]CRS-2101:The OLR was formatted using version 3.
2009-12-23 19:25:02.495
[ohasd(17767)]CRS-2112:The OLR service started on node racbde1.
2009-12-23 19:25:02.833
[ohasd(17767)]CRS-2772:Server 'racbde1' has been assigned to pool 'Free'.
2009-12-23 19:25:34.801
[cssd(18791)]CRS-1713:CSSD daemon is started in exclusive mode
2009-12-23 19:25:37.126
[cssd(18791)]CRS-1709:Lease acquisition failed for node racbde1 because no voting file has been configured; Details at (:CSSNM00031:) in /u01/app/grid/log/racbde1/cssd/ocssd.log
2009-12-23 19:25:54.705
[cssd(18791)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racbde1 .
2009-12-23 19:25:55.431
[ctssd(18848)]CRS-2403:The Cluster Time Synchronization Service on host racbde1 is in observer mode.
2009-12-23 19:25:55.575
[ctssd(18848)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racbde1.
2009-12-23 19:25:56.312
[ctssd(18848)]CRS-2401:The Cluster Time Synchronization Service started on host racbde1.
[client(19034)]CRS-10001:ACFS-9327: Verifying ADVM/ACFS devices.
[client(19038)]CRS-10001:ACFS-9322: done.
2009-12-23 19:30:26.790
[client(19423)]CRS-1006:The OCR location +SYSTEMDG is inaccessible. Details in /u01/app/grid/log/racbde1/client/ocrconfig_19423.log.
2009-12-23 19:30:27.883
[client(19423)]CRS-1001:The OCR was formatted using version 3.
2009-12-23 19:30:40.473
[crsd(19480)]CRS-1012:The OCR service started on node racbde1.
2009-12-23 19:31:53.331
[cssd(18791)]CRS-1605:CSSD voting file is online: /dev/sdb1; details in /u01/app/grid/log/racbde1/cssd/ocssd.log.
2009-12-23 19:31:53.373
[cssd(18791)]CRS-1605:CSSD voting file is online: /dev/sdb2; details in /u01/app/grid/log/racbde1/cssd/ocssd.log.
2009-12-23 19:31:53.417
[cssd(18791)]CRS-1605:CSSD voting file is online: /dev/sdb3; details in /u01/app/grid/log/racbde1/cssd/ocssd.log.
2009-12-23 19:31:54.413
[cssd(18791)]CRS-1626:A Configuration change request completed successfully
2009-12-23 19:31:54.424
[cssd(18791)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racbde1 .
2009-12-23 19:32:10.831
[ctssd(18848)]CRS-2405:The Cluster Time Synchronization Service on host racbde1 is shutdown by user
2009-12-23 19:32:26.536
[cssd(18791)]CRS-1603:CSSD on node racbde1 shutdown by user.
2009-12-23 19:32:26.856
[cssd(18791)]CRS-1625:Node racbde1, number 1, was manually shut down
2009-12-23 19:32:44.826
[cssd(20125)]CRS-1713:CSSD daemon is started in clustered mode
2009-12-23 19:34:07.568
[cssd(20125)]CRS-1707:Lease acquisition for node racbde1 number 1 completed
2009-12-23 19:34:07.690
[cssd(20125)]CRS-1605:CSSD voting file is online: /dev/sdb3; details in /u01/app/grid/log/racbde1/cssd/ocssd.log.
2009-12-23 19:34:07.731
[cssd(20125)]CRS-1605:CSSD voting file is online: /dev/sdb2; details in /u01/app/grid/log/racbde1/cssd/ocssd.log.
2009-12-23 19:34:07.774
[cssd(20125)]CRS-1605:CSSD voting file is online: /dev/sdb1; details in /u01/app/grid/log/racbde1/cssd/ocssd.log.
2009-12-23 19:34:25.380
[cssd(20125)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racbde1 .
2009-12-23 19:34:26.324
[ctssd(20269)]CRS-2403:The Cluster Time Synchronization Service on host racbde1 is in observer mode.
2009-12-23 19:34:26.448
[ctssd(20269)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racbde1.
2009-12-23 19:34:27.278
[ctssd(20269)]CRS-2401:The Cluster Time Synchronization Service started on host racbde1.
2009-12-23 19:34:41.941
[crsd(20392)]CRS-1012:The OCR service started on node racbde1.
2009-12-23 19:34:44.734
[crsd(20392)]CRS-1201:CRSD started on node racbde1.
Node 2 (racbde2) Clusterware Alert Log During root.sh:
2009-12-23 19:33:43.687
[client(12019)]CRS-2106:The OLR location /u01/app/grid/cdata/racbde2.olr is inaccessible. Details in /u01/app/grid/log/racbde2/client/ocrconfig_12019.log.
2009-12-23 19:33:43.700
[client(12019)]CRS-2101:The OLR was formatted using version 3.
2009-12-23 19:33:50.660
[ohasd(12058)]CRS-2112:The OLR service started on node racbde2.
2009-12-23 19:33:50.946
[ohasd(12058)]CRS-2772:Server 'racbde2' has been assigned to pool 'Free'.
2009-12-23 19:34:15.140
[ohasd(12058)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2009-12-23 19:34:17.910
[cssd(13108)]CRS-1713:CSSD daemon is started in exclusive mode
2009-12-23 19:35:39.019
[cssd(13108)]CRS-1707:Lease acquisition for node racbde2 number 2 completed
[cssd(13108)]CRS-1636:The CSS daemon was started in exclusive mode but found an active CSS daemon on node racbde1 and is terminating; details at (:CSSNM00006:) in /u01/app/grid/log/racbde2/cssd/ocssd.log
2009-12-23 19:35:39.043
[cssd(13108)]CRS-1603:CSSD on node racbde2 shutdown by user.
2009-12-23 19:35:39.152
[ohasd(12058)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'racbde2'.
2009-12-23 19:36:00.206
[cssd(13376)]CRS-1713:CSSD daemon is started in clustered mode
2009-12-23 19:36:19.648
[cssd(13376)]CRS-1707:Lease acquisition for node racbde2 number 2 completed
2009-12-23 19:36:19.762
[cssd(13376)]CRS-1605:CSSD voting file is online: /dev/sdb1; details in /u01/app/grid/log/racbde2/cssd/ocssd.log.
2009-12-23 19:36:19.810
[cssd(13376)]CRS-1605:CSSD voting file is online: /dev/sdb3; details in /u01/app/grid/log/racbde2/cssd/ocssd.log.
2009-12-23 19:36:19.857
[cssd(13376)]CRS-1605:CSSD voting file is online: /dev/sdb2; details in /u01/app/grid/log/racbde2/cssd/ocssd.log.
2009-12-23 19:36:31.342
[cssd(13376)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racbde1 racbde2 .
2009-12-23 19:36:32.707
[ctssd(13443)]CRS-2403:The Cluster Time Synchronization Service on host racbde2 is in observer mode.
2009-12-23 19:36:32.860
[ctssd(13443)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racbde1.
2009-12-23 19:36:33.600
[ctssd(13443)]CRS-2401:The Cluster Time Synchronization Service started on host racbde2.
[client(13473)]CRS-10001:ACFS-9327: Verifying ADVM/ACFS devices.
[client(13477)]CRS-10001:ACFS-9322: done.
2009-12-23 19:39:27.166
[crsd(13606)]CRS-1012:The OCR service started on node racbde2.
2009-12-23 19:39:30.419
[crsd(13606)]CRS-1201:CRSD started on node racbde2.
If further analysis is needed, it might be useful to compare a working rootcrs output (mine) to one that is failing (yours) to see what went wrong. Again the rootcrs log is in <GRID_HOME>/cfgtoollogs/crsconfig/rootcrs_<nodename>.log. I will divide the log into the following rootcrs sections:
- First Node Initial Setup (racbde1)
- First Node Setup OLR for storing Oracle local registry data
- First Node Setup GPnP wallet and profile
- First Node Setup and copy files for OHASD daemon
- First Node Start OHASD Daemon
- First Node Copy required CRS resources for OHASD to start
- First Node Start in Exclusive Mode and Configure Diskgroup
- First Node Push GPnP Profile to Remote Node(s)
- First Node Start Full Clusterware Stack
- First Node Adding Clusterware Resources
- Secondary Note Initial Setup (racbde2)
- Secondary Note Get GPnP Profile
- Secondary Node Setup OLR for storing Oracle local registry data
- Secondary Node Setup and copy files for OHASD daemon
- Secondary Node Start OHASD Daemon
- Secondary Node Copy required CRS resources for OHASD to start
- Secondary Node Start Full Clusterware Stack
- Secondary Node Adding Clusterware Resources