The Diagwait Function

本文提供了解决Oracle Clusterware节点驱逐问题的诊断辅助方法,通过设置diagwait属性,给予操作系统额外的时间来完成日志记录,从而收集更多诊断信息。适用于特定版本的Oracle Server和Oracle Clusterware。

ORACLE_CLUSTER_Diagwait

Using Diagwait as a diagnostic toget more information for diagnosing Oracle Clusterware Node evictions [ID559365.1]

Applies to:

OracleServer - Standard Edition - Version: 10.1.0.5to 11.2.0.1 - Release: 10.1 to 11.2
Oracle Server - Enterprise Edition - Version:10.1.0.5 to 11.2.0.1.0   [Release: 10.1 to 11.2]
Linux x86
HP-UX PA-RISC (64-bit)
IBM AIX on POWER Systems (64-bit)
Oracle Solaris on SPARC (64-bit)
HP-UX Itanium
Red Hat Enterprise Linux Advanced Server x86-64 (AMD Opteron Architecture)
Red Hat Enterprise Linux Advanced Server Itanium
Oracle Solaris on x86-64 (64-bit)
Linux x86-64
UnitedLinux Itanium
Oracle Server Enterprise Edition - Version: 10.1.0.5 to 11.1.0.7
Oracle Clusterware

Symptoms

Oracle Clusterware evicts the node from thecluster when

  • Node is not pinging     via the network heartbeat
  • Node is not pinging     the Voting disk
  • Node is hung/busy     and is unable to perform either of the earlier tasks

In Most cases when the node is evicted, there isinformation written to the logs to analyze the cause of the node eviction.However in certain cases this may be missing, the steps documented inthis note are to be used for those cases where there is not enoughinformation or no information to diagnose the cause of the eviction forClusterware versions less than 11gR2 (11.2.0.1).

Starting with 11.2.0.1,Customers do not need to set diagwait as the architecture has been changed.

Changes

None

Cause

When the node is evicted and the node is extremelybusy in terms of CPU (or lack of it) it is possible that the OS did not get time to flush the logs/traces to the file system. It may be usefulto set diagwait attribute to delay the node reboot togive additional time to the OS to write the traces. This setting willprovide more time for diagnostic data to be collected by safely and will NOTincrease probability of corruption. After setting diagwait, theClusterware will wait an additional 10 seconds (Diagwait - reboottime).Customers can unset diagwait by following the steps documented below afterfixing their OS scheduling issues.

* -- Diagwait can be set on windowsbut it does not change the behaviour as it does on Unix-Linux platforms

@ Forinternal Support Staff

Diagwait attribute was introduced in10.2.0.3 and is included in 10.2.0.4& 11.1.0.6 and higher releases. It has also been subsequently backported to10.1.0.5 on most platforms. This means it is possible to set diagwaiton 10.1.0.5 (or higher), 10.2.0.3 (or higher) and in 11.1.0.6 (or higher).If the command crsctl set/get css diagwait reports "unrecognized parameter diagwait specified"then it can be safely assumed that the Clusterware versiondoes not the necessary fixes to implement diagwait. If that is the case thencustomer is adviced to apply the latest patchset available before attempting toset diagwait

Solution

It is important that the clusterware stack must bedown on all the nodes when changing diagwait .The following steps providesthe step-by-step instructions on setting diagwait.

  1. Execute as root

#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop

  1. Ensure that     Clusterware stack is down on all nodes by executing

#ps -ef |egrep"crsd.bin|ocssd.bin|evmd.bin|oprocd"

This should return no processes. Ifthere are clusterware processes running and you proceed to the next step, youwill corrupt your OCR. Do not continue until the clusterware processes are downon all the nodes of the cluster.

  1. From one node of the     cluster, change the value of the "diagwait" parameter to 13     seconds by issuing the command as root:

#crsctlset css diagwait 13 -force

  1. Check if diagwait is     set successfully by executing. the following command. The command should     return 13. If diagwait is not set, the following message will be returned     "Configuration parameter diagwait is not defined"

#crsctl get cssdiagwait

  1. Restart the Oracle     Clusterware on all the nodes by executing:

#crsctl start crs

  1. Validate that the     node is running by executing:

#crsctl check crs

Unsetting/Removing diagwait

Customersshould not unset diagwait without fixing the OS scheduling issues as that canlead to node evictions via reboot. Diagwait delays the node eviction (andreconfiguration) by diagwait (13) seconds and as such setting diagwait does notaffect most customers.In case there is a need to remove diagwait, the abovementioned steps need to be followed except step 3 needs to be replaced by thefollowing command

#crsctl unset css diagwait -f

(Note:  the -f option must be used whenunsetting diagwait since CRS will be down when doing so)

References

NOTE:726833.1- Hangcheck-Timer Module Requirements for Oracle 9i, 10g, and 11g RAC on Linux

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值