【RAC】cssfatal缺少导致的节点1无法启动

本文详细记录了一个AIX5.3+10.2.0.5RAC环境下,RAC在关闭后重新启动时,节点1无法启动而节点2正常启动的问题。通过查看日志,发现是由于节点1的votingdisk问题导致OCSSD无法启动。进一步定位到/etc/oracle/scls_scr目录下缺少cssfatal文件,手动添加该文件后成功解决了问题。

环境:AIX5.3+10.2.0.5 RAC

情境描述:RAC在关闭后重新启动时,一节点无法启动,二节点正常启动

排错过程:

1. 尝试启动节点1crs服务

root# ./init.crs start crs

2. 监控启动过程中,crs的日志

OCSSD.log日志:

[    CSSD]2014-01-16 09:27:54.730 >USER:    Copyright 2014, Oracle version 10.2.0.5.0

[    CSSD]2014-01-16 09:27:54.730 >USER:    Starting CSS daemon on node nxjcdb1, number1, in cluster crs_dljc

[  clsdmt]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=nxjcdb1DBG_CSSD))

[    CSSD]2014-01-16 09:27:54.790 [1]>TRACE:   clssscmain: RT queuesetting: ON

[    CSSD]2014-01-16 09:27:55.081 [1]>TRACE:   clssscmain: local-only setto false

[    CSSD]2014-01-16 09:27:55.349 [1]>TRACE:   clssnmReadNodeInfo: addednode 1 (nxjcdb1) to cluster

[    CSSD]2014-01-16 09:27:55.672 [1]>TRACE:   clssnmReadNodeInfo: addednode 2 (nxjcdb2) to cluster

[    CSSD]2014-01-16 09:27:55.673 [1]>TRACE:   clssnmInitNMInfo:Initialized with unique 1389835674

[    CSSD]2014-01-16 09:27:55.704 [1]>TRACE:   clssNMInitialize:Initializing with OCR id (1516675067)

[    CSSD]2014-01-16 09:27:55.705 [1029] >TRACE:   clssnm_skgxninit: HACMP clusterware detected

[    CSSD]2014-01-16 09:27:56.822 [1]>TRACE:   clssnmNMInitialize:misscount set to (30)

[    CSSD]2014-01-16 09:27:56.900 [1]>TRACE:   clssnmStartNM: reboottimeset to (3) sec

[    CSSD]2014-01-16 09:27:56.900 [1]>TRACE:   clssnmNMInitialize: Networkheartbeat thresholds are: impending reconfig 15000 ms, reconfig start(misscount) 30000 ms

[    CSSD]2014-01-16 09:27:57.108 [1]>TRACE:   clssnmDiskStateChange: statefrom 1 to 2 disk (0//dev/rlvjc_voting)

[    CSSD]2014-01-16 09:27:57.108 [1030]>TRACE:   clssnmvDPT: spawned for disk0 (/dev/rlvjc_voting)

[    CSSD]2014-01-16 09:27:57.146 [1030]>TRACE:   clssnmvDiskOpen: Overwrotekill block for voting disk /dev/rlvjc_voting

[    CSSD]2014-01-16 09:27:59.163 [1030]>TRACE:   clssnmDiskStateChange: statefrom 2 to 4 disk (0//dev/rlvjc_voting)

[    CSSD]2014-01-16 09:27:59.164 [1]>ERROR:   Internal Error Information:

  Category: 1234

  Operation: scls_scr_setval

  Location: open

  Other: cant open file

  Dep: 2

 

[    CSSD]2014-01-16 09:27:59.164 [1]>ERROR:   clssscSclsFatal: failure 8reading fatal mode

[    CSSD]2014-01-16 09:27:59.164 [1]>ERROR:  ###################################

[    CSSD]2014-01-16 09:27:59.164 [1]>ERROR:   clssscExit: CSSD abortingfrom thread Main

[    CSSD]2014-01-16 09:27:59.164 [1]>ERROR:  ###################################

根据报错信息,初步判定是因为节点1无法voting disk造成    OCSSD无法启动。

[    CSSD]--- DUMP GROCK STATE DB ---

[    CSSD]--- END OF GROCK STATE DUMP ---

[    CSSD]2014-01-16 09:27:59.169 [1030]>TRACE:   clssnmvReadDskHeartbeat:read ALL for Joining

[    CSSD]2014-01-16 09:27:59.169 [1030]>TRACE:   clssnmvReadDskHeartbeat:node(2) is down. rcfg(2) wrtcnt(126947) LATS(1038806686) Disk lastSeqNo(126947)

[    CSSD]------- Begin Dump -------

 

[    CSSD]

[    CSSD]

[    CSSD]

[    CSSD]

[    CSSD]

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100863c0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100863d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100863e0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100863f0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086400 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086410 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086420 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086430 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086440 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086450 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086460 00 00 00 01 1008 61 98 - 00 00 00 01 10 c6 0b c0 ......a.........

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086470 00 00 00 00 00 0000 01 - 00 00 00 00 00 02 00 03 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086480 00 00 00 01 1096 4a b0 - 00 00 00 00 00 00 00 00 ......J.........

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086490 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100864a0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100864b0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 01 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100864c0 00 00 00 00 0000 00 05 - 00 00 00 00 00 00 00 fa ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100864d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100864e0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x1100864f0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086500 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.166 [1]>TRACE:   0x110086510 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086520 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086530 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086540 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086550 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086560 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086570 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086580 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086590 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100865a0 00 00 00 00 0000 00 00 - 00 00 00 0e 00 00 00 24 ...............$

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100865b0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100865c0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100865d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100865e0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100865f0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086600 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086610 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086620 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086630 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086640 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086650 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086660 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086670 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086680 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x110086690 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100866a0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100866b0 00 00 10 00 0000 00 97 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100866c0 00 00 00 01 105d f6 10 - 00 00 00 01 10 95 ca 50 .....].........P

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100866d0 00 00 00 01 1096 0a 70 - 00 00 00 3c 00 00 00 00 .......p...<....

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100866e0 00 00 00 01 1096 2a 90 - 00 00 00 01 00 00 00 01 ......*.........

[    CSSD]2014-01-16 09:28:00.167 [1]>TRACE:   0x1100866f0 00 00 00 28 0000 00 00 - 00 00 00 01 10 00 16 08 ...(............

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086700 00 00 00 00 0000 00 00 - 00 00 00 01 10 4c 1a 90 .............L..

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086710 00 00 00 01 104c 1a 50 - 00 00 00 01 10 08 67 18 .....L.P......g.

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086720 00 00 00 01 1008 67 18 - 00 00 00 00 00 00 00 00 ......g.........

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086730 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086740 00 00 00 01 10cf 76 b0 - 00 00 00 00 00 00 00 00 ......v.........

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086750 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086760 00 00 00 01 10cf 76 b0 - 00 00 00 00 00 00 00 00 ......v.........

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086770 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086780 00 00 01 00 0000 00 00 - 00 00 00 01 10 bd f7 90 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086790 00 00 00 01 104c 17 50 - 00 00 00 00 00 00 00 00 .....L.P........

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100867a0 00 00 00 00 0000 00 00 - 00 00 00 03 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100867b0 00 00 00 01 1000 09 78 - 00 00 00 01 00 00 00 00 .......x........

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100867c0 00 00 00 00 0000 00 01 - 00 00 00 01 10 3e df b0 .............>..

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100867d0 00 00 00 01 1046 49 90 - 00 00 00 01 10 46 4a 30 .....FI......FJ0

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100867e0 00 00 00 01 1046 4c 90 - 6e 78 6a 63 64 62 31 00 .....FL.nxjcdb1.

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100867f0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086800 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086810 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086820 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086830 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086840 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086850 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086860 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086870 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086880 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x110086890 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100868a0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100868b0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100868c0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100868d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.168 [1]>TRACE:   0x1100868e0 00 00 00 00 0000 00 00 - 6e 78 6a 63 64 62 31 2d ........nxjcdb1-

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100868f0 70 72 69 00 0000 00 00 - 00 00 00 00 00 00 00 00 pri.............

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086900 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086910 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086920 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086930 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086940 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086950 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086960 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086970 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086980 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086990 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100869a0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100869b0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100869c0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100869d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100869e0 00 00 00 00 0000 00 00 - 2f 6f 72 61 63 6c 65 2f ......../oracle/

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x1100869f0 70 72 6f 64 7563 74 2f - 31 30 2e 32 2e 30 2f 63 product/10.2.0/c

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086a00 72 73 5f 31 0000 00 00 - 00 00 00 00 00 00 00 00 rs_1............

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086a10 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1] >TRACE:   0x110086a20 00 00 00 00 00 00 00 00 - 00 0000 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086a30 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1] >TRACE:   0x110086a40 00 00 00 00 00 00 00 00 - 00 0000 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086a50 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1] >TRACE:   0x110086a60 00 00 00 00 00 00 00 00 - 00 0000 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086a70 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1] >TRACE:   0x110086a80 00 00 00 00 00 00 00 00 - 00 0000 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1]>TRACE:   0x110086a90 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[    CSSD]2014-01-16 09:28:00.169 [1] >TRACE:  0x110086aa0 00 00 00 00 00 00 00 00 -00 00 00 00 00 00 00 00 ................

/dev/rlvjc_voting

3. 日志报错,节点1crs服务无法启动。

   根据报错信息,初步判定是因为节点1无法voting disk造成  OCSSD无法启动。

4. 问题定位

发现/etc/oracle/scls_scr/ballontt(主机名)/oracle路径下缺少cssfatal文件,此文件里只有“enable”一个单词

~cat cssfatal

Enable

5. 问题解决

手动添加该文件

~vi cssfatal

Enable

6. 成功启动节点1CRS


ballontt
2014/01/27

---The End---
微博weibo.com/ballontt
如需转载,请标明出处和链接,谢谢!

<think>好的,用户现在的问题是关于Oracle RAC节点故障的重新启动处理方法,需要详细的处理步骤。先看一下用户提供的引用内容,以及之前的对话历史,确保回答准确且符合用户需求。 首先,用户之前的问题是关于单节点故障处理,现在进一步询问具体的重启步骤。之前的回答已经涵盖了故障检测、自动重启、手工排查和数据一致性验证,但现在需要更具体的步骤指南。 引用1提到Oracle RAC和MySQL的自动重启机制,但用户现在专注于Oracle RAC,所以需要重点放在Clusterware和CRS的相关操作。引用2是关于删除和添加节点的命令,可能在处理节点替换时有用,但当前问题是重启,可能需要参考这个步骤来调整节点列表或重新配置。引用3提到了重启后节点资源无法启动的问题,说明可能存在配置或资源冲突,需要检查日志和资源状态。 用户的需求是明确的步骤,所以需要结构化回答,分步骤说明如何处理单节点故障后的重启。可能需要包括检查集群状态、隔离故障节点、日志分析、手动启动资源、修复后的重新加入等步骤。 还要注意用户之前的回答中提到的自动重启机制和手工介入部分,需要结合起来。比如,在自动重启失败的情况下,如何手动干预。同时,要确保引用的内容被正确使用,比如引用2中的命令可能在节点删除时使用,但当前问题可能涉及节点恢复,所以需要调整说明。 另外,用户可能遇到的情况包括节点无法自动恢复,需要手动重启集群服务,或者存在资源冲突导致节点无法正常加入集群。因此,步骤中需要包括检查资源状态、清理残留锁、重新配置节点等操作。 需要确保每个步骤都有对应的命令示例,比如使用crsctl检查状态,srvctl启动实例,以及通过日志分析定位问题。同时,提醒用户验证数据一致性,如引用3中提到的问题,可能需要在重启后使用工具检查数据完整性。 最后,生成的相关问题要围绕用户可能进一步询问的方向,比如如何预防单点故障、日志分析工具使用、资源冲突解决等,帮助用户扩展知识。</think>以下是Oracle RAC节点故障重启的详细处理步骤,结合故障检测、日志分析和手动干预流程: --- ### **步骤1:确认集群状态** 通过以下命令检查集群整体健康状态: ```bash # 查看集群资源状态 crsctl status res -t # 检查节点存活状态 olsnodes -s ``` - 若故障节点显示为`UNKNOWN`或`INACTIVE`,说明集群已检测到异常[^2]。 - 若节点状态为`OFFLINE`,需进一步排查网络或存储问题。 --- ### **步骤2:隔离故障节点** 若节点因硬件/网络问题无法恢复,需手动隔离以防止资源争用: ```bash # 强制关闭故障节点上的集群服务 crsctl stop crs -f # 从集群配置中临时移除节点(需在正常节点执行) crsctl delete node -n <故障节点名> ``` 注:此操作需谨慎,避免误删节点配置[^2]。 --- ### **步骤3:分析日志定位原因** 关键日志文件路径及排查点: 1. **集群日志**:`$GRID_HOME/log/<节点名>/alert<节点名>.log` - 检查`CRS-1001`/`CRS-0215`等错误代码,确认通信超时或资源锁冲突[^3]。 2. **数据库告警日志**:`$ORACLE_BASE/diag/rdbms/<DB_NAME>/<实例名>/trace/alert_<实例名>.log` - 关注`ORA-29740`(实例驱逐)或`ORA-600`(内部错误)。 3. **操作系统日志**:`/var/log/messages`(Linux)或系统事件查看器(Windows) - 检查硬件故障(如磁盘I/O错误)或网络中断记录。 --- ### **步骤4:手动重启节点服务** 若自动恢复失败,按顺序执行以下操作: ```bash # 1. 清理残留资源锁 crsctl delete resource ora.<资源名>.db -f # 2. 重启集群服务 crsctl start crs # 3. 启动数据库实例 srvctl start instance -d <DB_NAME> -i <实例名> ``` - 若启动失败,尝试重置节点资源: ```bash crsctl modify resource ora.<资源名>.db -attr "AUTO_START=always" ``` --- ### **步骤5:修复后重新加入集群** 若节点已修复且日志无异常: ```bash # 重新添加节点到集群(在正常节点执行) crsctl add node -n <故障节点名> # 同步集群配置 $GRID_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=<GRID_HOME> CLUSTER_NODES=<节点列表> ``` 完成后通过`srvctl status nodeapps -n <节点名>`验证服务状态[^2]。 --- ### **关键注意事项** 1. **脑裂防护**:确保表决磁盘(Voting Disk)和OCR(Oracle Cluster Registry)的冗余性,避免单点故障引发集群分裂。 2. **超时参数调整**:若频繁误判故障,可调整`misscount`参数(默认30秒): ```bash crsctl set css misscount 60 ``` 3. **数据一致性验证**:使用`DBV`或`RMAN VALIDATE`检查数据文件完整性,确保无损坏块[^1]。 --- ### 相关问题 1. **如何通过日志快速定位Oracle RAC节点通信故障?** 2. **Oracle RAC中表决磁盘(Voting Disk)损坏应如何恢复?** 3. **在哪些场景下需要手动清理OCR中的残留资源锁?**
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值