BlockRecover之ORA-00494

凌晨00:10左右接到监控电话,说库连接异常,随即起床登陆主机一看,库实例已然crash掉啦,查告警日志:
Mon Mar 23 20:45:24 2015
Thread 1 advanced to log sequence 234601 (LGWR switch)
  Current log# 3 seq# 234601 mem# 0: /dev/oravg/rgsinas_redo13
Mon Mar 23 20:48:28 2015
KCF: write/open error block=0x840 online=1
     file=204 /dev/vg01/rora_data207
     error=65535 txt: ''
Automatic datafile offline due to write error on
file 204: /dev/vg01/rora_data207
Mon Mar 23 21:16:30 2015
Errors in file /oracle/admin/jhdb/bdump/jhdb_lgwr_4176.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4169'
Mon Mar 23 21:16:32 2015
System State dumped to trace file /oracle/admin/jhdb/bdump/jhdb_lgwr_4176.trc
Killing enqueue blocker (pid=4169) on resource CF-00000000-00000000
 by killing session 825.1
Mon Mar 23 21:21:42 2015
Errors in file /oracle/admin/jhdb/bdump/jhdb_lgwr_4176.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4169'
Mon Mar 23 21:21:44 2015
System State dumped to trace file /oracle/admin/jhdb/bdump/jhdb_lgwr_4176.trc
Killing enqueue blocker (pid=4169) on resource CF-00000000-00000000
 by terminating the process
LGWR: terminating instance due to error 2103
Mon Mar 23 21:21:55 2015
Instance terminated by LGWR, pid = 4176


随即用dbv检查file 204发现:
Corrupt block relative dba: 0x33000840 (file 204, block 2112)
gsinas01[/oracle]$dbv file='/dev/vg01/rora_data207';
DBVERIFY: Release 10.2.0.4.0 - Production on Fri Mar 27 10:24:43 2015
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /dev/vg01/rora_data207
Page 2106 is influx - most likely media corrupt
Corrupt block relative dba: 0x3300083a (file 204, block 2112)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x3300083a
 last change scn: 0x0cd1.9978968d seq: 0x2 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x74ab0602
 check value in block header: 0x414f
 computed block checksum: 0x73bb
DBVERIFY - Verification complete
Total Pages Examined         : 1048575
Total Pages Processed (Data) : 1020
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1145
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 18
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1046391
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Highest block SCN            : 2574816070 (3281.2574816070)
file为204,blk id为2112;

入库检查对应204 file为/dev/vg01/rora_data207,其状态如下:
select STATUS,name from v$datafile;
ONLINE  /dev/oravg/rora_data204
ONLINE  /dev/oravg/rora_data205
ONLINE  /dev/oravg/rora_data189
ONLINE  /dev/oravg/rora_data23
ONLINE  /dev/vg01/rora_data206
RECOVER /dev/vg01/rora_data207
ONLINE  /dev/vg01/rora_data208
ONLINE  /dev/vg01/rora_data209
ONLINE  /dev/vg01/rora_data210
ONLINE  /dev/vg01/rora_data211
SQL> SELECT segment_name,segment_type FROM dba_extents  WHERE file_id=204 AND 2112 BETWEEN block_id AND block_id + blocks -1;

SEGMENT_NAME                          SEGMENT_TYPE
--------------------------------------------------------------------------------- ------------------
IDX_CONSTRUCTIONCRM_SUCCFLAG 

SQL> select count(*) from inas.CONSTRUCTIONCRM;
select count(*) from inas.CONSTRUCTIONCRM
                          *
ERROR at line 1:
ORA-00376: file 204 cannot be read at this time
ORA-01110: data file 204: '/dev/vg01/rora_data207'

SQL> SELECT owner,segment_name,segment_type FROM dba_extents  WHERE file_id=204 AND 2112 BETWEEN block_id AND block_id + blocks -1;

OWNER                          SEGMENT_NAME                          SEGMENT_TYPE
------------------------------ --------------------------------------------------------------------------------- ------------------
INAS                           IDX_CONSTRUCTIONCRM_SUCCFLAG          INDEX
所幸的是对应坏块的是个应用索引。

果断进行块恢复:
SQL> host
$ rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Tue Mar 24 10:00:34 2015
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
connected to target database: JHDB (DBID=923949890)
RMAN> 

RMAN> RUN {
2> ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE' parms='ENV=(NB_ORA_CLIENT=gsinas01,NB_ORA_POLICY=OSS_gsinas01_Oracle_Full_Backup)'; 
3> BLOCKRECOVER DATAFILE 204 BLOCK  2112;
4> }

released channel: ORA_DISK_1
allocated channel: ch00
channel ch00: sid=1491 devtype=SBT_TAPE
channel ch00: Veritas NetBackup for Oracle - Release 7.6 (2014042414)
Starting blockrecover at 24-MAR-15
channel ch00: restoring block(s)
channel ch00: specifying block(s) to restore from backup set
restoring blocks of datafile 00204
channel ch00: reading from backup piece bk_20333_1_874973991
channel ch00: restored block(s) from backup piece 1
piece handle=bk_20333_1_874973991 tag=HOT_DB_BK_LEVEL0
channel ch00: block restore complete, elapsed time: 00:05:55
starting media recovery
media recovery complete, elapsed time: 00:03:46

Finished blockrecover at 24-MAR-15
released channel: ch00
RMAN> 
RMAN>  backup validate datafile 204;

Starting backup at 24-MAR-15
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=1491 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00204 name=/dev/vg01/rora_data207
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:04
Finished backup at 24-MAR-15
RMAN> 

RMAN> recover datafile 204;

Starting recover at 24-MAR-15
using channel ORA_DISK_1

starting media recovery
archive log thread 1 sequence 234600 is already on disk as file /arch1/1_234600_704593860.dbf
archive log thread 1 sequence 234601 is already on disk as file /arch1/1_234601_704593860.dbf
archive log thread 1 sequence 234602 is already on disk as file /arch1/1_234602_704593860.dbf
archive log thread 1 sequence 234603 is already on disk as file /arch1/1_234603_704593860.dbf
archive log thread 1 sequence 234604 is already on disk as file /arch1/1_234604_704593860.dbf
archive log thread 1 sequence 234605 is already on disk as file /arch1/1_234605_704593860.dbf
archive log thread 1 sequence 234606 is already on disk as file /arch1/1_234606_704593860.dbf
archive log filename=/arch1/1_234600_704593860.dbf thread=1 sequence=234600
archive log filename=/arch1/1_234601_704593860.dbf thread=1 sequence=234601
archive log filename=/arch1/1_234602_704593860.dbf thread=1 sequence=234602
media recovery complete, elapsed time: 00:00:06
Finished recover at 24-MAR-15
RMAN> 
入库查询:
SQL> alter database datafile 204 online;
Database altered.
SQL> select count(*) from inas.CONSTRUCTIONCRM;
  COUNT(*)
----------
   1499634

STATUS  NAME
------- ------------------------------
ONLINE  /dev/oravg/rora_data204
ONLINE  /dev/oravg/rora_data205
ONLINE  /dev/oravg/rora_data189
ONLINE  /dev/oravg/rora_data23
ONLINE  /dev/vg01/rora_data206
ONLINE  /dev/vg01/rora_data207
ONLINE  /dev/vg01/rora_data208
ONLINE  /dev/vg01/rora_data209
ONLINE  /dev/vg01/rora_data210
ONLINE  /dev/vg01/rora_data211

再用dbv检查file情况:
$ dbv file='/dev/vg01/rora_data207';

DBVERIFY: Release 10.2.0.4.0 - Production on Tue Mar 24 10:14:30 2015
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /dev/vg01/rora_data207
DBVERIFY - Verification complete

Total Pages Examined         : 1048575
Total Pages Processed (Data) : 1020
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1146
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 18
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1046391
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 2567939499 (3281.2567939499)

无报错,都恢复正常。

原本的错误:
ORA-01110: 数据文件 204: '/dev/vg01/rora_data207'
ORA-06512: 在 "INASQM.P_BACK_UP_EWS_QM", line 16
ORA-06512: 在 line 1
Tue Mar 24 08:14:54 2015
Errors in file /oracle/admin/jhdb/bdump/jhdb_j000_10062.trc:
ORA-12012: 自动执行作业 425 出错
ORA-00376: 此时无法读取文件 204
ORA-01110: 数据文件 204: '/dev/vg01/rora_data207'
ORA-06512: 在 "INASQM.P_BACK_UP_EWS_QM", line 16
ORA-06512: 在 line 1

已经终止报错:

Tue Mar 24 10:46:50 2015
Thread 1 advanced to log sequence 234608 (LGWR switch)
  Current log# 2 seq# 234608 mem# 0: /dev/oravg/rgsinas_redo12
  
至此故障已经恢复,库恢复正常,无中断;然而之前的告警报错仍然有问题可循,以下供参考:

1 ORA-00494: enqueue [CF] held for too long (more than 900 seconds); LGWR:Terminating Instance Due to Error 2103 (文档 ID 1089733.1)

Applies to: 
 Oracle Database - Enterprise Edition - Version 10.2.0.4 and later
Information in this document applies to any platform.
Symptoms
You see the following issue in the alert.log:

KCF: write/open error block=0x9a989 online=1
file=7 /Oratemp/oradata/M6DTH/temp07.dbf
error=27063 txt: 'IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 24576'
Automatic tempfile offline due to write error on
file 7: /Oratemp/oradata/M6DTH/temp07.dbf

ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 2876448'

LGWR: terminating instance due to error 2103
ORA-02103: PCC: inconsistent cursor cache (out-of-range cuc ref)

Some or all of the following function codes may be found in the 'stack trace' portion of the associated trace file.:
------------- Call Stack ---------------
sskgpwwait
skgpwwait
ksliwat
kslwaitns_timed
kskthbwt
kslwait
kcbbpibr
kcfbsy
kcfbof
kcfofa
ksbabs
ksbrdp
opirip
opidrv
sou2o
opimai_real
main
__start() 

Cause

Bug 7600269 closed as a duplicate of  Bug:7411568
Bug:9273547 closed as a duplicate of Bug:7411568

Trace file/alert log matches that in the bug
1) Starts with space error and file offline
2) Sessions are waiting for CF enqueue
3) Arguments ksliwat   kslwaitns_timed  kskthbwt  kslwait... match that in the call stack of bug
4) There is a wait on kcbzps and occurs with DBWR

Solution
 Apply Patch 7411568
2 如果不打补丁用隐含参数也可以避免,但没有测试是否稳妥:

通过隐含参数禁止Oracle主动kill CF锁的持有进程,禁止Oracle去kill所有持有CF锁的进程:

_kill_controlfile_enqueue_blocker=false

也可以通过设置另外一个隐含参数禁止Oracle去kill持有锁的后台进程:

_kill_enqueue_blocker=1(默认为3)

进程请求CF enqueue的time out时间默认是900秒,这是由下面的参数控制的:

_controlfile_enqueue_timeout=900

进程持有CF enqueue的时间最长不能超过120秒:

_controlfile_enqueue_holding_time=120


3 网友惜分飞的不补充说明:
补充说明
1.在9i中使用blockrecover会在Archive destination中生产block的备份文件,类似469_519791_3063_2442393528.bkd(file 469 block 519791),需要注意Archive destination目录结尾需要”/”,不然可能出现直接写入和该目录并列的Archive destination+blockrecover产生文件;10g中不生成该文件;如果该block在9i中未备份,也不会在生产相关文件,而是利用归档恢复.
2.blockrecover需要还原从该block从备份之后的所有归档(如果被备份起来需要还原出来),如果缺少归档可能导致恢复失败,包括基于scn/seq/time的等
3.blockrecover在执行过程中或者执行异常终止,该block依然是坏块,不会对其他block产生影响.在使用blockrecover出现异常终止后的block修复,在后续blog中提供解决方法
4.对于不含事务的坏块,如果数据允许丢失可以通过设置event跳过坏块(特殊block除外),然后重建对象;对于含event的block 坏块,建议使用blockrecover处理或者直接恢复数据文件,如果没有备份,考试使用event跳过事务回滚,然后屏蔽坏块处理

  
         

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/13750068/viewspace-1471545/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/13750068/viewspace-1471545/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值