WCDMA测试库故障处理过程

最新推荐文章于 2025-05-01 23:24:04 发布

cuanye4002

最新推荐文章于 2025-05-01 23:24:04 发布

阅读量147

点赞数

文章标签：数据库

WCDMA测试库故障处理过程

1.问题定位
初步观察，数据库进程全部关闭，ipc资源也被释放，但alert里面没有关闭和报错信息。

2.尝试启动数据库
SQL>sqlplus / as sysdba

SQL>startup

Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_13740.trc:
ORA-00320: cannot read file header from log 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/GDWDB1/GDWDB1/redo01.log'
ORA-27091: unable to queue I/O
ORA-27069: attempt to do I/O beyond the range of the file
Additional information: 1
Additional information: 1
Thu Oct 8 16:53:48 2009
Aborting crash recovery due to error 320
Thu Oct 8 16:53:48 2009
Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_13740.trc:
ORA-00320: cannot read file header from log 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/GDWDB1/GDWDB1/redo01.log'
ORA-27091: unable to queue I/O
ORA-27069: attempt to do I/O beyond the range of the file
Additional information: 1
Additional information: 1
ORA-320 signalled during: ALTER DATABASE OPEN...

报的是redo文件错误，无法读取文件头。
查看了一下redo文件，发现三个redo文件的大小都是0。

再查看跟踪文件/opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_13740.trc，也没有看到什么线索。
Ioctl ASYNC_CONFIG error, errno = 1
/opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_17405.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/OraHome_1
System name:    HP-UX
Node name:      GDWDB
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: GDWDB1
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 17405, image: oracle@GDWDB

数据库没有归档也没有备份。没办法，XX的东西，见怪不怪了。

3.既然redo文件都没有了，就无法保证数据的一致性了，先清理一下redo，故障依然，但具体错误也忘记了。
SQL>alter database clear unarchived logfile group 1;
SQL>alter database clear unarchived logfile group 2;
SQL>alter database clear unarchived logfile group 3;

4.使用resetlogs打开数据库
SQL> shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2.1475E+10 bytes
Fixed Size                  2241992 bytes
Variable Size            5192837688 bytes
Database Buffers         1.6274E+10 bytes
Redo Buffers                5857280 bytes
Database mounted.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 62463922 generated at 10/08/2009 18:52:37 needed for thread 1
ORA-00289: suggestion : /opt/oracle/OraHome_1/dbs/arch1_1_699734980.dbf
ORA-00280: change 62463922 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/data/GDWDB1/GDWDB1/system01.dbf'

ORA-01112: media recovery not started

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

5.使用隐含参数_allow_resetlogs_corruption 以resetlogs方式打开数据库
SQL> shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

ORA-01112: media recovery not started

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

查看alert日志，发现如下错误
Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_1370.trc:
ORA-00600: internal error code, arguments: [2662], [0], [62464000], [0], [62512366], [8388617], [], []
Thu Oct 8 20:12:00 2009
Errors in file /opt/oracle/admin/GDWDB1/udump/gdwdb1_ora_1370.trc:
ORA-00600: internal error code, arguments: [2662], [0], [62464000], [0], [62512366], [8388617], [], []
Thu Oct 8 20:12:00 2009
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 1370
ORA-1092 signalled during: ALTER DATABASE OPEN...

看看关于这个错误的资料，原因是scn不一致引起的
ORA-00600: internal error code, arguments: [2662], [0], [62464000], [0], [62512366], [8388617], [], []
就是说当前scn：62464000 < 文件scn：62512366

6.强行推进scn
SQL>alter system set "_ALLOW_RESETLOGS_CORRUPTION"=false scope=spfile;
SQL>shutdown immediate
SQL>startup mount
SQL>alter session set events 'IMMEDIATE trace name ADJUST_SCN level 2';--将scn推进2百万。
SQL>alter database open;

新的错误又出来了，
ksedmp: internal or fatal errorORA-00600: internal error code, arguments: [4193], [1171], [1187], [], [], [], [], []
Current SQL statement for this session:
UPDATE SMON_SCN_TIME SET SCN_WRP=:1, SCN_BAS=:2, TIME_MP=:3, TIME_DP=:4
WHERE TIME_MP = :5 AND THREAD = :6 AND ROWNUM <= 1
还有部分错误报undo的错误。

再看看资料，说这错误是上一错误ORA-00600: internal error code, arguments: [2662]引起的副作用，需要重建一下undo表空间。

7.重建undo表空间
SQL>create undo tablespace undotbs02 as datafile '/data/GDWDB1/undotbs02.dbf' size 500M;
SQL>alter system set undo_tablespace = undotbs02 scope=both;
SQL>shutdown immediate
SQL>startup
数据库成功打开。

8.使用expdp将数据备份。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/13605188/viewspace-616196/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/13605188/viewspace-616196/