DBA群里朋友的RAC环境的ONS进程无法启动。平台是Redhat 5.3 64bit的。
Ons log如下:
2010-10-18 09:42:11.384: [RACG][3041022624] [16815][3041022624][ora.rac1.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/product/10.2.0/crs_1
2010-10-18 09:42:11.384: [RACG][3041022624] [16815][3041022624][ora.rac1.ons]: clsrcexecut: cmd = /u01/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/oracle/product/10.2.0/crs_1/bin/onsctl stop
2010-10-18 09:42:11.384: [RACG][3041022624] [16815][3041022624][ora.rac1.ons]: clsrcexecut: rc = 99, time = 540.630s
2010-10-18 10:55:44.720: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]:timeout: killed the spawned process
2010-10-18 10:55:44.721: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]:clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/product/10.2.0/crs_1
2010-10-18 10:55:44.721: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]: clsrcexecut: cmd = /u01/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/oracle/product/10.2.0/crs_1/bin/onsctl start
2010-10-18 10:55:44.721: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]:clsrcexecut: rc = 99, time = 540.410s
2010-10-18 10:59:12.517: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]: /u01/oracle/product/10.2.0/crs_1/bin/onsctl: line 81: 31584Terminated$ONSADMIN ping
ons is not running ...
2010-10-18 10:59:12.517: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/product/10.2.0/crs_1
2010-10-18 10:59:12.517: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]: clsrcexecut: cmd = /u01/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/oracle/product/10.2.0/crs_1/bin/onsctl ping
2010-10-18 10:59:12.517: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]: clsrcexecut: rc = 1, time = 207.800s
2010-10-18 10:59:12.517: [RACG][1604653728] [18288][1604653728][ora.rac1.ons]: end for resource = ora.rac1.ons, action = start, status = 1, time = 748.230s
2010-10-18 10:59:13.781: [RACG][1366147744] [1357][1366147744][ora.rac1.ons]: onsctl: shutting down ons daemon ...
/u01/oracle/product/10.2.0/crs_1/bin/onsctl: line 118:1362 Terminated$ONSADMIN shutdown
onsctl: shutdown of ons failed!
crsd.log信息如下:
timeout for ora.rac1.ons timeout=600
start resource error for ora.rac1.ons error code=-2
从错误看是连接超时。而且RAC运行正常,但是ONS进程较多,而且占用大量的CPU资源,cpu消耗100%。因为这个是生产库,所以慎重操作。将DBA1群的布豆加入讨论组,布豆在RAC上的经验比较丰富。
布豆的说法,Oracle RAC进程有时会有莫名其妙的不正常,Oracle原厂也说不清。朋友重启了节点1的服务器后,ons启动正常了,然后又重启了节点2.朋友怀疑是网络的策略做了变更,对系统产生了影响。
问题解决后,我们三小聊了会,其中一个话题就是备份。备份对与数据库来说重于一切。要备份数据库,控制文件,spfile。这些文件对恢复来说很重要。只有有效的备份,才可能将出现的损失降到最低。
------------------------------------------------------------------------------