VIP Failover Take Long Time After Network Cable Pulled

在一个10g RAC环境中测试拔掉网线,vip的切换花了很长时间,在oracle support上搜索如下:

[ID 403743.1]

Applies to:

Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.1.0.7

This problem can occur on any platform.

Symptoms

This example is based on SUN Solaris platform, with IPMP configured for the public network. In this case, VIP failover takes almost 4 minutes to complete when both network cables of the public network are pulled from one node.

crsd.log shows:

2006-12-07 13:14:05.401: [ CRSAPP][4588] CheckResource error for ora.node1.vip error code = 1

2006-12-07 13:14:05.408: [ CRSRES][4588] In stateChanged, ora.node1.vip target is ONLINE

2006-12-07 13:14:05.409: [ CRSRES][4588] ora.node1.vip on node1 went OFFLINE unexpectedly

<<< detect network cable failure and VIP OFFLINE immediately

2006-12-07 13:14:05.410: [ CRSRES][4588] StopResource: setting CLI values

2006-12-07 13:14:05.420: [ CRSRES][4588] Attempting to stop `ora.node1.vip` on member `node1`

2006-12-07 13:14:06.651: [ CRSRES][4588] Stop of `ora.node1.vip` on member `node1` succeeded.

2006-12-07 13:14:06.652: [ CRSRES][4588] ora.node1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0

2006-12-07 13:14:06.667: [ CRSRES][4588] ora.node1.vip failed on node1 relocating.

2006-12-07 13:14:06.758: [ CRSRES][4588] StopResource: setting CLI values

2006-12-07 13:14:06.766: [ CRSRES][4588] Attempting to stop `ora.node1.LISTENER_NODE1.lsnr` on member `node1`

2006-12-07 13:17:41.399: [ CRSRES][4588] Stop of `ora.node1.LISTENER_NODE1.lsnr` on member `node1` succeeded.

<<< takes 3.5 minutes to stop listener

2006-12-07 13:17:41.402: Attempting to stop `ora.node1.ASM1.asm` on member `node1`

<<< stop dependant inst and ASM

2006-12-07 13:17:55.610: [ CRSRES][4588] Stop of `ora.node1.ASM1.asm` on member `node1` succeeded.

2006-12-07 13:17:55.661: [ CRSRES][4588] Attempting to start `ora.node1.vip` on member `node2`

2006-12-07 13:18:00.260: [ CRSRES][4588] Start of `ora.node1.vip` on member `node2` succeeded.

<<< now VIP failover complete after almost 4 mins

ora.node1.LISTENER_NODE1.lsnr.log shows:

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1vip)(PORT=1521)(IP=FIRST)))

TNS-12535: TNS:operation timed

2006-12-07 13:17:41.329: [ RACG][1] [23916][1][ora.node1.LISTENER_NODE1.lsnr]: out

TNS-12560: TNS:protocol adapter error

TNS-00505: Operation timed out

Solaris Error: 145: Connection timed out

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.10.100)(PORT=1521)(IP=FIRST)))

The command completed successfully

Client connection hang during this failover time.

Changes

This may be a new setup, or a setup that was migrated from an earlier release.

Cause

This problem is caused by the first address in the listener.ora configuration being an address that uses the TCP protocol.

In this circumstance, when a network cable is pulled, “lsnrctl stop” listener has to wait for TCP timeout before it can check next address. On the Solaris platform, TCP timeout is defined by tcp_ip_abort_cinterval with a default value of 180000 (3 minutes).   That is why shutting down listener almost took 3.5 minutes. (TCP timeout on other platforms may vary).  The error message “Solaris Error: 145: Connection timed out” in ora.node1.LISTENER_NODE1.lsnr.log also indicates it is waiting for tcp timeout.

The listener.ora in this scenario is defined as:

Solution

To prevent this, move the IPC address to be the first address for the listener in the listener.ora, eg:

 
    

When lsnrctl tries to stop the listener, it will now connect to the IPC address first, which is available during that time. It will not have to wait for tcp timeout.

After the above change, the VIP failover only takes 48 to 50 seconds to complete regardless of the tcp_ip_abort_cinterval setting.

Please note, listener.ora files newly created from 10.2.0.3 to 11.1.0.7 should have the IPC protocol as the first address in listener.ora in most cases.  However, if you have upgraded from a previous release, or manually modified/copied over a listener.ora from a previous install, you may not have the IPC protocol as the first address, regardless of your version. Manual modification is required to move IPC protocol to be the first address to avoid the problem described in this note.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/13558130/viewspace-625632/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/13558130/viewspace-625632/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值