IPS阻挡未知archlog导致standbyDB无法同步(ORA-03135,ORA-16055)

本文介绍了一起因IPS误判导致归档日志无法正常传输的问题案例。问题发生在HP-UX环境下,Oracle 10.2.0.4版本的RAC集群中,涉及到主备库间的归档日志同步失败。最终通过调整IPS策略解决了问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

IPS阻挡未知archlog导致standbyDB无法同步

EnvironmentHPUX 11.31 ORACLE 10.2.0.4 PDB 3nodes RAC Standby 2nodes RAC

SymptomstandbyDB无法同步,越来越多的archlog未被apply,且存在gapstandby DB在等待113171这个log,但是这个log传不过来。primary DB

2号节点alertlog报错如下:

ORA-03135: connection lost contact

Tue Jul 24 12:18:06 2012

FAL[server, ARC5]: FAL archival, error 3135 closing archivelog file 'sfc12stb'

FAL[server, ARC5]: FAL archive failed, see trace file.

Tue Jul 24 12:18:06 2012

Errors in file /apps/oracle/admin/sfc12db/bdump/sfc12db2_arc5_16226.trc:

ORA-16055: FAL request rejected

ARCH: FAL archive failed. Archiver continuing Tue Jul 24 12:18:06 2012 ORACLE Instance sfc12db2 - Archival Error. Archiver continuing.

Tue Jul 24 12:19:59 2012

 

Solution

 根据alertlog中的保存检索MOS,如下

ORA - 03135 : connection lost contact while shipping from Primary Server to Standby server [ID 739522.1]附录1

Changes There may be a firewall rule changed or the firewall is newly installed.

 根据这个信息联系网络部门同事,可网络部门同事严重误导了我,说我的主库和备库之间是纯路由网络没有firewall,这样我就开始不断尝试其他的办法试图解决这个问题。

1,       重启2号节点。

StandbyDB:Alter database recover managed standby database cancel

alter database recover managed standby database disconnect;

     依然无效。问题就是当standby视图获取archive gap 113171时直接被

     二号节点拒绝ORA-03135: connection lost contact

2.      轮流重启了所有节点,且重启的standby DBos没有效果gap 113171这个log无法传送过来。

3.        尝试手动传输113171这个log

           A.将sequence=113171的这个logASM传到filesystem步骤如下:

        1.Log ontothe target database that is local to the ASM instance as the sys user.

 

2.create source directory with in the target database.

 

SQL>create or replace directory SOURCE_DIR as '+DGARCH/SFC12DB/ARCHIVELOG/2012_07_24';

Directory created.

(In this example +DGARCH/SFC12DB/ARCHIVELOG/is the source directory where the datafile is located and where you wish to copy the file from.)

 

3.create destination directory with in database.

SQL>createor replace directory ORACLE_DEST as '/tmp';

Directory created.

(In this example /restoreisthe destination directory where the datafile is to be copied to.)

4.Executethe dbms_file_transfer package.

SQL>

BEGIN

dbms_file_transfer.copy_file(source_directory_object =>

' SOURCE_DIR ', source_file_name => 'thread_2_seq_113171.540.789475773',

destination_directory_object =>'ORACLE_DEST',

destination_file_name =>'thread_2_seq_113171.540.789475773');

END;

/

Ps:oracle 11GASM的文件是可以直接copy的就不用这么麻烦了。

B.利用FTP传输到standbyDB

  问题是我这个archlog 17M

ftp传输到13.9M时卡住不动。无法传输ok…..,我有试着传输大于17M的文件可以传送。小于17M的也可以。难道是这个log有问题?我又尝试在同一个网段传送这个log居然可以传送(primaryDB 172.16.50.31/32/33 standbyDB 172.16.51.151/152),纠结了。这时已经是凌晨一点多了,距离问题发生已经13个小时,这十个来小时高度紧张,思考疲惫死了,睡觉去明天再找网络部门排查。

 

 

 

4.     一夜没睡好,第二天找网络部门同事一起来看,为什么跨网段后,这个113171 log就无法完整传输?和两位网络部门的同事一起尝试不同网段之间传送这个文件,还是无法完整传输。这事网络部门同事想到我们公司还有一个IPSintrusion prevention system入侵防御系统。检查日志终于找到问题的源头,未知的原因导致IPS113171这个文件当成DOS攻击,阻止了主库和备库之间的传输。(我有想掐死他们的冲动,昨天问过他们好多次这两个网段间是否有firewall,他们一直肯定的说没有,好吧别给我解释IPSfirewall不是一回事…..

  

  

5.  问题的原因找到网络同事修改IPS策略之后可以完整传送这个log

  standby DB手动注册这个log

  Alter database register logfile'/apps/oracle/1_117472_679867280.dbf';

   这个gap消除之后 standbydb 又报了两个gap是其他两个thread

  可是这两个log已经没有(备份策略是备份后删除archivelog只保留最近六个小时的),这样需要从磁带中抓取那两个log

 脚本如下:

 RUN

{

      allocate channel dev0 type'sbt_tape'

  parms'SBT_LIBRARY=/opt/omni/lib/libob2oracle8_64bit.so,ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=sfc12db,OB2BARLIST=sfc12rc3_fullDB)';

      SETARCHIVELOG DESTINATIONTO'/tmp';

      RESTORE ARCHIVELOG SEQUENCE 113854 thread 3 ; --只是恢复需要的log

      release channel dev0;

}

   附录1.使用From Until语句

restore archivelog from sequence 69346 until sequence 69397 thread 1;
restore archivelog from sequence 75193 until sequence 75263 thread 2;

2.使用Between语句

restore archivelog sequence between 134 and 136 thread 1;
restore archivelog sequence between 56 and 58 thread 2;

 

  

对备库进行设置:

alter database recover managed standby database cancel;

alter database recover managed standby database disconnect;

当所有的log都被applied之后

alter database recover managed standby database cancel;

alter database recover managed standby database using current logfile disconnect;---启用real time apply处理OK

 

 

 思考:原来的archivelog 都可以正常的传送而不会被挡掉,发生了什么使得IPS将这个log当作了DOS攻击?.......

 

 

 

 

 

附录一

ORA - 03135 : connection lost contact while shipping from Primary Server to Standby server [ID 739522.1]


 

Modified 19-AUG-2011     Type PROBLEM     Status MODERATED

 

In this Document
  Symptoms
  Changes
  Cause
  Solution
  References


This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.

Applies to:

Oracle Net Services - Version: 10.1.0.2.0 to 11.2.0.2 - Release: 10.1 to 11.2
Information in this document applies to any platform.

Symptoms

customer reported intermittently the archiver process hits error and
following seen in the primary node alert log

Errors in file /opt/oracle/admin/ORCL/bdump/orcl1_arc4_28059.trc:
ORA-03135: connection lost contact
Mon Sep 22 22:24:51 2008
FAL[server, ARC4]: FAL archive failed, see trace file.
Mon Sep 22 22:24:51 2008
Errors in file /opt/oracle/admin/ORCL/bdump/orcl1_arc4_28059.trc:
ORA-16055: FAL request rejected



.

Changes

There may be a firewall rule changed or the firewall is newly installed.

Cause

Mainly ORA-3135 occurs when the connection is broken because of underlying network issues.

In  this case, an intermediate firewall between primary and secondary server is altering the data inside the sqlnet packet.

The trace clearly shows that primary sends a data with

[22-SEP-2008 22:24:51:414] nspsend: 53 53 3D 28 50 52 4F 54 |SS=(PROT|
[22-SEP-2008 22:24:51:414] nspsend: 4F 43 4F 4C 3D 54 43 50 |OCOL=TCP|
[22-SEP-2008 22:24:51:414] nspsend: 29 28 48 4F 53 54 3D 70 |)(HOST=h|
[22-SEP-2008 22:24:51:414] nspsend: 6E 79 6D 65 72 63 75 72 |ostname1|
[22-SEP-2008 22:24:51:414] nspsend: 79 64 62 30 39 2D 76 69 |_db011-v|
[22-SEP-2008 22:24:51:414] nspsend: 72 74 29 28 50 4F 52 54 |ip)(PORT|
[22-SEP-2008 22:24:51:414] nspsend: 3D 31 35 32 31 29 29 61 |=1521))a|
[22-SEP-2008 22:24:51:414] nspsend: 20 4C 0B 05 01 00 0D 00 |.L......|


and the firewall / intermediate network device is changing the hostname to IP address (possibly a
rule set in the firewall).

[22-SEP-2008 22:24:51:586] nsprecv: 53 53 3D 28 50 52 4F 54 |SS=(PROT|
[22-SEP-2008 22:24:51:586] nsprecv: 4F 43 4F 4C 3D 54 43 50 |OCOL=TCP|
[22-SEP-2008 22:24:51:586] nsprecv: 29 28 48 4F 53 54 3D 31 |)(HOST=1|
[22-SEP-2008 22:24:51:586] nsprecv: 36 31 2E 32 32 31 2E 32 |11.111.1|
[22-SEP-2008 22:24:51:586] nsprecv: 31 37 2E 35 35 29 28 50 |23.45)(P|
[22-SEP-2008 22:24:51:586] nsprecv: 4F 52 54 3D 37 31 36 39 |ORT=7171|
[22-SEP-2008 22:24:51:586] nsprecv: 29 29 61 20 4C 0B 05 01 |))a.L...|

Solution

Involve the firewall administration team to check whether the firewall /intermediate network device can be stopped from altering the sqlnet packet.

If this is not the case, then we need to take matching sqlnet server traces at support level to narrow the issue.

Enabling the server trace on the primary and restarting the archiver process will generate sqlnet trace for the archiver process.

 Solution to tracing archiver process without getting tons of trace for other background processes and connections is to

1. Enabled Sever tracing on primary
2. Kill the archive process that is archiving log files remote (dont kill the wrong archiver)
3. Disable trace

Do these 3 actions as quickly as possible. Tie the server trace to the new archive process via PID.
Oracle net server trace will have the PID in net trace name
Archive process will be traced untill the process is stopped/killed. So you may wish to kill the process again, once error reproduced.

WARNING !!!Small chance the process wont re-spawn



Then enable the sqlnet tracing on the standby server to get the corresponding process which takes the request from this archiver process.

Once error is hit match the archiver sqlnet trace file to see the network issue if any.

Note: you can easily cross check the sqlnet server trace and archiver process by means of Process ID which is appended to the trace file name in the bdump.

 

For enabling tracing referNote 395525.1  How to Enable Oracle SQLNet Client , Server , Listener , Kerberos and External procedure Tracing from Net Manager

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

潇湘秦

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值