欢迎关注“数据库运维之道”公众号,一起学习数据库技术! 本期将为大家分享“备份失败RMAN-06169及ORA-30013错误处理”的案例。
关键词:RMAN-06169、ORA-30013、ORA-01157
客户巡检发现数据库备份日志出现RMAN-06169报错,即:could not read file header for datafile 65 error reason 4。根据数据文件编号进一步定位是UNDOTBS2表空间下的一个数据文件存放目录不规范。数据库备份脚本配置在节点1,数据文件存放到节点2的磁盘上,导致节点1上的数据库实例无法读到数据文件触发这个错误。
DBA看到6169错误信息后,基本可以确认是数据文件头访问有异常。根据reason code信息可以初步判断大致原因,reason 4表示dbwr进程无法找到数据文件。
6169, 3, "could not read file header for datafile %(1)s error reason %(2)s"
// *Cause: The specified data file could not be accessed. The reason codes are:
// 1 - file name is MISSINGxx in the control file
// 2 - file is offline
// 3 - file is not verified
// 4 - DBWR could not find the file
// 5 - unable to open file
// 6 - I/O error during read
// 7 - file header is corrupt
// 8 - file is not a data file
// 9 - file does not belong to this database
// 10 - file number is incorrect
// 12 - wrong file version
// 15 - control file is not current
接着要核实下数据文件65是归属于哪个表空间、数据库运行状态和备份日志。
set linesize 1000
col name for a70
select TS#,file#,name,status from v$datafile order by 1;
col tablespace_name for a20
col file_name for a60
select tablespace_name,file_id,file_name,status,online_status
from dba_data_files order by 1;
根据上述的查询结果,可以确认UNDOTBS2表空间增加的数据文件存放目录不规范,即数据文件未存放到ASM共享磁盘组。然后进一步确认两个实例对应的默认UNDO表空间信息。
set linesize 1000
col name for a20
col value for a20
select name,value from v$parameter
where name in ('undo_management','undo_tablespace');
NAME VALUE
-------------------- --------------------
undo_management AUTO
undo_tablespace UNDOTBS2
set linesize 1000
col tablespace_name for a20
select TABLESPACE_NAME, CONTENTS,
EXTENT_MANAGEMENT, ALLOCATION_TYPE,
SEGMENT_SPACE_MANAGEMENT
from dba_tablespaces where contents='UNDO';
TABLESPACE_NAME CONTENTS EXTENT_MAN ALLOCATIO SEGMEN
-------------------- --------- ---------- --------- ------
UNDOTBS1 UNDO LOCAL SYSTEM MANUAL
UNDOTBS2 UNDO LOCAL SYSTEM MANUAL
根据以上的初步排查,在不重启数据库的情况下可采取两种处置方案:1、新建UNDO表空间,删除旧的UNDO表空间; 2、将UNDO数据文件调整到ASM磁盘组;
解决方案一:新建UNDO表空间,删除旧的UNDO表空间
1、查看undo表空间默认参数值,查看数据文件目录信息。undo表空间是用来存储数据被改之前的前镜像。
show parameter undo_tablespace
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_tablespace string UNDOTBS2
col file_name for a60
set linesize 100
select tablespace_name,file_name from dba_data_files
where tablespace_name='UNDOTBS2';
2、创建新的undo表空间,并设置自动扩展参数,调整undo_tablespace参数
CREATE SMALLFILE UNDO TABLESPACE "UNDOTBS02" DATAFILE '+DATA' SIZE 200M AUTOEXTEND ON;
alter system set undo_tablespace='UNDOTBS02' SCOPE=both sid='ywzd2';
3、检查待删除的回滚段有没有正在执行的事务,如果没有会相对简单,可以直接删除掉该回滚段即可,并且没有数据丢失。
--检查回滚段状态
select tablespace_name,segment_id,segment_name,file_id,block_id,status,instance_num
from dba_rollback_segs where tablespace_name='UNDOTBS2' and status = 'ONLINE';
TABLESPACE_NAME SEGMENT_ID SEGMENT_NAME FILE_ID BLOCK_ID STATUS INSTANCE_NUM
------------------------------ ---------- ------------------------------ ---------- ---------- ---------------- ----------------------------------------
UNDOTBS2 12 _SYSSMU12_3316011814$ 6 144 ONLINE
UNDOTBS2 14 _SYSSMU14_3105766383$ 6 176 ONLINE
--调整undo_tablespace参数后,活动会话会切换到新的undo表空间。
--alert日志文件会出现相关的提示信息
[6114] Successfully onlined Undo Tablespace 2.
[6114] **** active transactions found in undo Tablespace 5 - moved to Pending Switch-Out state.
[6114] active transactions found/affinity dissolution incompletein undo tablespace 5 during switch-out.
4、进一步定位具体的会话信息
select s.SID, s.serial#,s.machine,s.program,s.sql_id,s.status,substr(s.username, 1, 10) username, g.tablespace_name,g.segment_name
from v$transaction, dba_rollback_segs g, v$session s where saddr = ses_addr and xidusn = segment_id;
5、跟客户确认待清理的会话信息。
alter system kill session '829,31003' immediate;
6、删除原有的UNDO表空间。
drop tablespace UNDOTBS2 including contents and datafiles;
正常情况下,上述步骤是可以解决问题的。可是实际情况比这个复杂,删除undo表空间出现ORA-30013错误,即表空间在使用中后台进程还在使用这个表空间。
--删除命令
SQL> drop tablespace UNDOTBS2 including contents;
drop tablespace UNDOTBS2 including contents
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS2' is currently in use
--alert日志信息
Sat Mar 30 23:30:08 2024
drop tablespace UNDOTBS2 including contents
ORA-30013 signalled during: drop tablespace UNDOTBS2 including contents...
Sat Mar 30 23:30:15 2024
[26385] **** active transactions found in undo Tablespace 5 - moved to Pending Switch-Out state.
Sat Mar 30 23:32:50 2024
[26385] **** active transactions found in undo Tablespace 5 - moved to Pending Switch-Out state.
查看数据库正在运行的事务,包括对应的会话信息。
--活动事务信息
select ADDR, XIDUSN,STATUS, START_TIME from v$transaction;
--PENDING OFFLINE状态的会话信息
set linesize 1000
col name for a30
col username for a15
col machine for a20
col osuser for a10
SELECT
a.usn,
a.name,
b.status,
c.tablespace_name,
d.addr,
e.sid,
e.serial#,
e.username,
e.program,
e.machine,
e.osuser
FROM
v$rollname a,
v$rollstat b,
dba_rollback_segs c,
v$transaction d,
v$session e
WHERE
a.usn=b.usn AND
a.name=c.segment_name AND
a.usn=d.xidusn AND
d.addr=e.taddr AND
b.status='PENDING OFFLINE';
--基于上面查出来的sid,serial#信息,清理掉相关的会话
ALTER SYSTEM KILL SESSION '<sid#>, <serial#>' immediate;
--最后再执行表空间删除命令(如果这些步骤还不能解决问题,可以查看推荐二的解决方案)
drop tablespace UNDOTBS2 including contents and datafiles;
解决方案二:使用ASMCMD CP 将UNDO数据文件调整到ASM磁盘组
使用ASMCMDCP命令将数据文件从文件系统迁移到ASM
1、为了保证数据的一致性,先将数据文件设置为OFFLINIE状态。
--切换日志文件
SQL> alter system switch logfile;
--查看数据文件信息
select tablespace_name,file_name,file_id,bytes/(1024*1024*1024)g,autoextensible
from dba_data_files where tablespace_name like '%UNDO%';
--将数据文件离线
SQL> alter database datafile 6 offline;
2、拷贝数据文件到ASM共享磁盘
--查看数据文件信息是否处于RECOVER状态。
SQL> select file_name, file_id,online_status from dba_data_files where file_id=6;
--ASMCMD复制文件,使用ASMCMD将数据文件从文件系统复制到磁盘组。
ASMCMD> cp/u01/app/oracle/product/12.1.0/dbhome_1/dbs/SYSAUX02.DBF +YWZD/YWZD/SYSAUX02.dbf
Copying/u01/app/oracle/product/12.1.0/dbhome_1/dbs/SYSAUX02.DBF->+YWZD/YWZD/SYSAUX02.dbf
ASMCMD> ls –lt
ASMCMD> pwd
3、修改控制文件并恢复数据文件
--rename数据文件;数据文件复制之后,通过rename进行数据文件重命名
SQL> alter database rename file '/u01/app/oracle/product/12.1.0/dbhome_1/dbs/SYSAUX02.DBF'to '+YWZD/YWZD/SYSAUX02.dbf';
--恢复数据文件:对数据文件进行recover恢复
12c版本
SQL> alter database recover datafile 6;
11g版本
RMAN> recover datafile 6;
--将数据文件设置为在线
SQL> alter database datafile 6 online;
--验证数据文件路径
SQL> select file_name, file_id,online_status from dba_data_files where file_id=6;
FILE_NAME FILE_ID ONLINE_STATUS
---------------- ------- ------------------
+YWZD/YWZD/SYSAUX02.dbf 6 ONLINE
推荐阅读一:12c在线迁移数据文件
1、若集群数据库的数据文件没有放到ASM共享磁盘组,查询dba_data_files视图可能会遇到ORA-01157错误。
--在集群环境下,多节点实例共享数据文件。如果数据文件存放在文件系统,会导致其中一个节点无法访问数据文件。
--假如:数据文件创建在节点1的文件系统,节点2访问数据文件报错。
SQL> select file_name from dba_data_files;
select file_name from dba_data_files
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 6- see DBWR trace file
ORA-01110: data file 6:
'/u01/app/oracle/product/12.1.0/dbhome_1/dbs/D:appOracleoradatahbgmsSYSAUX02.DBF
'
--官方建议:
$oerr ora 01157
01157, 00000, "cannot identify/lock data file %s - see DBWR trace file"
// *Cause: The background process was either unable to find one of the data
// files or failed to lock it because the file was already in use.
// The database will prohibit access to this file but other files will
// be unaffected. However the first instance to open the database will
// need to access all online data files. Accompanying error from the
// operating system describes why the file could not be identified.
// *Action: Have operating system make file available to database. Then either
// open the database or do ALTER SYSTEM CHECK DATAFILES.
2、12c数据库支持在线数据文件迁移,不需要对数据文件进行离线处理。
--查看当前数据文件路径和状态
col file_name for a70
set linesize 200
select file_id,file_name,status from dba_data_files order by file_id;
--将数据文件调整至指定的路径目录,然后运行上面的SQL验证文件路径与状态
alter database move datafile '/u01/app/oracle/oradata/PROD4/system01.dbf' to '/u01/app/oracle/oradata/system01.dbf';
推荐阅读二:分布式事务排查与处理
如果上述的解决方案一无法释放UNDO表空间里的活动事务,那么数据库可以存在分布式事务。
--问题依旧存在
SQL> drop tablespace undotbs2 including contents and datafiles;
drop tablespace undotbs2 including contents and datafiles
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS2' is currently in use
--进一步收集分布式事务信息
Select segment_name,status,tablespace_name from dba_rollback_segs where status not in ('ONLINE','OFFLINE') ;
select KTUXEUSN, KTUXESLT, KTUXESQN, KTUXESTA, KTUXECFL, KTUXESIZ from x$ktuxe where KTUXESTA='ACTIVE' and KTUXECFL='DEAD';
SELECT LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,MIXED,COMMIT# FROM DBA_2PC_PENDING;
--local_tran_id is a trio combination of USN , Slot , Seq
SELECT LOCAL_TRAN_ID,IN_OUT,DATABASE,INTERFACE FROM dba_2pc_neighbors;
通过以下脚本清理掉分布式事务会话信息。
declare
v_txn VARCHAR2(22);
CURSOR trans_cursor is
SELECT LOCAL_TRAN_ID FROM DBA_2PC_PENDING WHERE STATE='forced rollback';
begin
open trans_cursor ;
LOOP
fetch trans_cursor into v_txn;
EXIT WHEN trans_cursor%NOTFOUND ;
Commit;
DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(v_txn) ;
END LOOP;
COMMIT;
end;
/
推荐阅读三:X$KTUXE回滚段事务表
1、X$KTUXE是数据库内部表,可以通过desc x$ktuxe查看结构信息。KTUXE代表的就是:[K]ernel [T]ransaction [U]ndo Transa[x]tion [E]ntry (table)。在这个表中,数据库展示了回滚段头的事务表信息。通过这个结构体可以获得数据库事务相关的重要信息。X$KTUXE表的另外一个重要功能是可以获得无法通过v$transaction来观察的死事务信息。当数据库发生异常中断,或者进行延迟事务恢复时,数据库启动后,无法通过V$TRANSACTION来观察事务信息,但是X$KTUXE可以帮助我们获得这些信息。
--通过V$TYPE_SIZE视图可以找到Oracle的自解释信息
select * from v$type_size where component='KTU';
COMPONEN TYPE DESCRIPTION TYPE_SIZE
-------- -------- -------------------------------- ----------
KTU KTUBH UNDO HEADER 16
KTU KTUXE UNDO TRANSACTION ENTRY 40
KTU KTUXC UNDO TRANSACTION CONTROL 104
--表中的KTUXECFL代表了事务的Flag标记,通过这个标记可以找到那些Dead事务
select distinct KTUXECFL,count(*) from x$ktuxe group by KTUXECFL;
KTUXECFL COUNT(*)
------------------------ ----------
NONE 778
--KTUXESIZ用来记录事务使用的回滚段块数,可以通过观察这个字段来评估恢复进度:
select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL ='DEAD';
2、错误信息为大量的DML不仅会产生大量的REDO记录还会导致UNDO空间增长,进而占用大量的磁盘空间。11g恢复数据文件需要使用RMAN工具,否则会出现以下错误。
SQL> alter database recover datafile 65;
alter database recover datafile 65
*
ERROR at line 1:
ORA-00279: change 12597056539218 generated at 03/30/2024 21:09:53 needed for thread 2
ORA-00289: suggestion : +FRA/urpdb/archivelog/2024_03_30/thread_2_seq_146434.1195.1165007427
ORA-00280: change 12597056539218 for thread 2 is in sequence #146434
- RMAN-06169: could not read file header for datafile XX error reason 5 (Doc ID 2868000.1)
- Backup Fails RMAN-06169: could not read file header for datafile % error reason % (Doc ID 2991317.1)
- Troubleshooting ORA-30013 Error (Doc ID 1578717.1)
- Unable to Drop Undo Tablespace ORA-30013 (Doc ID 835944.1)
- Undo Tablespace Moved To Pending Switch-Out State (Doc ID 341372.1)
- How to Shrink the datafile of Undo Tablespace (Doc ID 268870.1)
- Managing Rollback/Undo Segments in AUM (Automatic Undo Management) (Doc ID 135090.1)
- Unable to Drop Undo Tablespace ORA-30013 (Doc ID 835944.1)
- How to clear huge number of In-doubt transactions in PREPARED state (Doc ID 2489254.1)
- DBA手记:X$KTUXE与Oracle的死事务恢复-优快云博客
以上就是本期关于“备份失败RMAN-06169及ORA-30013错误处理”的处置案例。希望能给大家带来帮助。
欢迎关注“数据库运维之道”公众号,一起学习数据库技术!