How to Obtain a System State Trace BEFORE the Error ORA-2049 Occurs

本文介绍ORA-02049错误的原因及诊断方法,提供了一个监测并捕获系统状态转储的实用程序OERR_2049_SSTATE,帮助定位资源锁定的会话。

In this Document

Goal
Fix
References

Applies to:

Oracle Server - Enterprise Edition - Version 10.1.0.2 to 11.2.0.2 [Release 10.1 to 11.2]
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.2 to 11.2.0.2


Goal

This document is intended to explain under what conditions an ORA-02049 'timeout: distributed transaction waiting for lock' error can be signaled, and to provide atoolto diagnose situations when this occurs. The challenge with troubleshooting these types of errors is that by the time the error is thrown, the contention no longer is present. A type of trace used byOracleSupport called a system state dump is useful for finding the session holding the resources that the session(s) signaling ORA-02049 are waiting on, but it must be obtained before the error is signaled. Therefore the contention cannot be captured even by cascading a system state trace off of an ORA-02049 errorstack trace, which is automatically written when the ORA-02049 error occurs.

An ORA-02049 'timeout: distributed transaction waiting for lock' error occurs when a session that is involved in a distributed transaction waits for another session's TX enqueue (row lock) for longer than the database distributed_lock_timeout. What constitutes a 'distributed transaction' is what generates confusion on this topic. Any operation that uses a database link - even a query - starts a distributed transaction. For example, consider the following:

Session 1:
-----------
SQL> select * from global_name;

GLOBAL_NAME
---------------------------------------------------------------------------
ORCL102A.WORLD

SQL> update scott.dept set loc='NOME' where deptno=10;

1 row updated.


Session 2:
----------
SQL> select * from global_name;

GLOBAL_NAME
---------------------------------------------------------------------------
ORCL102A.WORLD

SQL> select 1 from dual@orcl102b.world;

1
----------
1

SQL> update scott.dept set loc='NOME' where deptno=10;
update scott.dept set loc='NOME' where deptno=10
*
ERROR at line 1:
ORA-02049: timeout: distributed transaction waiting for lock

If a 'commit' is issued in session 2 after the distributed query but before the update operation, no ORA-02049 error is encountered.

Common methods of dealing with this problem are:

  1. Increase the database initialization parameter distributed_lock_timeout. This will enable sessions involved in a distributed transaction to wait longer for a row lock, in hopes that the session holding the row lock will release it in the increased span of time.

  2. Code the application so that the error is a 'try again' exception - if it hits the error, keep trying the action until it succeeds. Eventually the session holding the contended row lock should release it and the waiting session should be able to proceed.

However, it is a common desire for customers to want to identify the session holding the resources. The procedure presented below allows us to do this five seconds before an ORA-02049 will be signaled.

NOTE: "enq: TX - row lock contention" is based on how your application has been designed, coded and being executed by different sessions to cause row level contention. This wait can only be fixed by changing the application code, design or usage of the application.

Fix

The procedure below monitors v$session for any sessions waiting on 'enq: TX - row lock contention' for seconds_in_wait. If seconds_in_wait for a session waiting on 'enq: TX - row lock contention' advances to a user-specified number of seconds before distributed_lock_timeout will expire, and the session is involved in a distributed transaction, it will write a system state trace to background_dump_dest (10g) or diagnostic_dest (11g). It will stop running after either writing three system state traces, or the time specified for the procedure to run expires, which ever occurs first.

The OERR_2049_SSTATE procedure takes two input parameters:

  • run_minutes: The number of minutes the procedure will execute before executing. This should be set high enough to capture the problem on the system.
  • sec_b4_timeout: This specifies the number of seconds before the database distributed_lock_timeout expires that the procedure will dump the systemstate. On busy systems it can takes quite some time to dump a systemstate, so it may be necessary to experiment with this parameter - the higher the value the more likely that the systemstate will capture the problem, but it is also less likely that an ORA-02049 will be signaled since we are dumping the systemstate that much sooner than the database distributed_lock_timeout would expire.

To implement the procedure:

1) To prepare for running the procedure, the schema it will be created in needs privileges to be able to select from v$ views, for creating and running procedures, etc.

CREATE USER oerr IDENTIFIED BY oerr;
GRANT CONNECT, RESOURCE, CREATE PROCEDURE, ALTER SESSION TO oerr;
GRANT SELECT ON SYS.V_$SESSION TO oerr;
GRANT SELECT ON SYS.V_$LOCK TO oerr;
GRANT SELECT ON SYS.V_$PARAMETER TO oerr;
GRANT EXECUTE ON DBMS_LOCK TO oerr;

2) Create the following table in the schema that owns the procedure. It allows the procedure to be terminated at any time. Otherwise the procedure must be killed administratively from another session.

create table on_off (run varchar2(1));
insert into on_off values ('Y');
commit;

To stop the procedure before the time specified for the procedure to run expires, execute:

update on_off set run = 'N';
commit;

3) Create the procedure in the appropriate schema:

CREATE OR REPLACE PROCEDURE OERR_2049_SSTATE (run_minutes number,
sec_b4_timeout number)
IS

v_start_date date := sysdate;
dist_lock_timeout number;
flag integer := 1;
v_wait number;
v_run on_off.run%type := 'Y';

cursor sess_wait is select sid, saddr, seconds_in_wait
from v$session where event = 'enq: TX - row lock contention';

v_seconds_in_wait number;
v_sid number;
v_lock_type varchar2(2);
NODATA BOOLEAN := FALSE;

BEGIN

execute immediate 'update on_off set run = ''Y''';
execute immediate 'commit';

dbms_output.put_line(' ');

execute immediate 'alter session set nls_date_format =
''YYYY-MM-DD HH24:MI:SS''';
dbms_output.put_line('Procedure started at ' || v_start_date);

select to_number(value) into dist_lock_timeout
from v$parameter where name = 'distributed_lock_timeout';

dbms_output.put_line('Value of distributed_lock_timeout =
' || dist_lock_timeout);

IF dist_lock_timeout < 10 THEN
dbms_output.put_line('Value of distributed_lock_timeout is too small,
exiting...');
v_run := 'N';
ELSE
v_wait := dist_lock_timeout - sec_b4_timeout;
END IF;

IF dist_lock_timeout - sec_b4_timeout <= 5 THEN
dbms_output.put_line('Value of sec_b4_timeout is too small, exiting...');
v_run := 'N';
END IF;

WHILE flag < 4 and sysdate - v_start_date < (run_minutes/1440)
and v_run = 'Y' LOOP
FOR r in sess_wait LOOP
v_seconds_in_wait := r.seconds_in_wait;
IF v_seconds_in_wait > v_wait THEN
v_sid := r.sid;
BEGIN
select type into v_lock_type from v$lock where sid = v_sid
and type = 'DX';
dbms_output.put_line('DX lock found...');
EXCEPTION
WHEN NO_DATA_FOUND THEN
NODATA := TRUE;
END;
IF NODATA THEN
NODATA := FALSE;
END IF;
IF v_lock_type = 'DX' THEN
dbms_output.put_line('Dumping systemstate #' || flag || '...');
execute immediate 'alter session set
tracefile_identifier=''SSTATE''';
execute immediate 'alter session set events
''immediate trace name SYSTEMSTATE level 10''';
flag := flag + 1;
dbms_lock.sleep(10);
END IF;
END IF;
END LOOP;
select run into v_run from on_off;
commit;
END LOOP;
dbms_output.put_line('Procedure ended at ' || sysdate);

END;
/

4) To run the procedure, specify the time in minutes for the procedure to run, and the time in seconds to dump the systemstate(s) before the database distributed_lock_timeout expires as input. For example, to run the procedure for 300 minutes (5 hours) and to dump the systemstate 20 seconds before the database distributed_lock_timeout would expire, execute:

exec oerr_2049_sstate(300,20);

The traces will have 'SSTATE' as part of the file name. As stated previously, the output will be written to background_dump_dest (10g) or diagnostic_dest (11g). The procedure will stop running after either writing three system state traces, or the time specified for the procedure to run expires, which ever occurs first.

RAC Support

If the database is running on RAC, In order to generate the SYSTEMSTATE dumps simultaneously on all nodes of the cluster, a script. named oradbg needs to be created in all nodes of the cluster as follows:








#!/bin/sh


ORADBG_LOG_FILE=/home/oracle/log/oradbg.log <= Create this directory or change the name to an existing one


PATH=$ORACLE_HOME/bin:/bin


(


echo "======================================="


echo "`date` - ORADBG called SID=$ORACLE_SID by OS pid $*" \ >> $ORADBG_LOG_FILE


echo " "


echo "Running SQLPLUS to execute debug script"


sqlplus -s '/ as sysdba' << EOF


oradebug setmypid


oradebug -G all dump systemstate 267


EOF


echo " "


)





give execution privileges on the script. at the os level usingm and make sure the scripts belongs to the oracle user:





chmod 700 oradbg





Set the following parameter on all the instances of the RAC:





alter system set "_oradbg_pathname"='/oradbg' scope=spfile sid='*';
Where  is the directory where the oradbg script. is located.





Restart the database instances and verify the script. is working executing:





sqlplus / as sysdba


alter session set events='immediate debug';





This should show in the screen the following messages:





Running SQLPLUS to execute debug script.
Statement processed.


Statement processed.





And when completed, should generate the SYSTEMSTATE dump on the bdump directory of each instance. The trace will belong to the diag background process.





If everything is working then modify the procedure oerr_2049_sstate as follows to execute the diag script. instead of the systemstate dump as follows:





CREATE OR REPLACE PROCEDURE OERR_2049_SSTATE (run_minutes number,


   sec_b4_timeout number)


IS





  v_start_date date := sysdate;


  dist_lock_timeout number;


  flag integer := 1;


  v_wait number;


  v_run on_off.run%type := 'Y';





  cursor sess_wait is select sid, saddr, seconds_in_wait


  from v$session where event = 'enq: TX - row lock contention';





  v_seconds_in_wait number;


  v_sid number;


  v_lock_type varchar2(2);


  NODATA BOOLEAN := FALSE;





BEGIN





  execute immediate 'update on_off set run = ''Y''';


  execute immediate 'commit';





  dbms_output.put_line(' ');





  execute immediate 'alter session set nls_date_format = ''YYYY-MM-DD HH24:MI:SS''';


  dbms_output.put_line('Procedure started at ' || v_start_date);





  select to_number(value) into dist_lock_timeout from v$parameter where name = 'distributed_lock_timeout';





  dbms_output.put_line('Value of distributed_lock_timeout = ' || dist_lock_timeout);





  IF dist_lock_timeout < 10 THEN


    dbms_output.put_line('Value of distributed_lock_timeout is too small, exiting...');


    v_run := 'N';


  ELSE


     v_wait := dist_lock_timeout - sec_b4_timeout;


  END IF;





  IF dist_lock_timeout - sec_b4_timeout <= 5 THEN


    dbms_output.put_line('Value of sec_b4_timeout is too small, exiting...');


    v_run := 'N';


  END IF;





  WHILE flag < 4 and sysdate - v_start_date < (run_minutes/1440) and v_run = 'Y' LOOP


     FOR r in sess_wait LOOP


        v_seconds_in_wait := r.seconds_in_wait;


        IF v_seconds_in_wait > v_wait THEN


           v_sid := r.sid;


           BEGIN


              select type into v_lock_type from v$lock where sid = v_sid and type = 'DX';


              dbms_output.put_line('DX lock found...');


              EXCEPTION


                WHEN NO_DATA_FOUND THEN


                NODATA := TRUE;


           END;


           IF NODATA THEN


              NODATA := FALSE;


           END IF;


           IF v_lock_type = 'DX' THEN


              dbms_output.put_line('Dumping systemstate #' || flag || '...');


              execute immediate 'alter session set events ''immediate debug''';


              flag := flag + 1;


              dbms_lock.sleep(10);


           END IF;


        END IF;


      END LOOP;


   select run into v_run from on_off;


   commit;


   END LOOP;


   dbms_output.put_line('Procedure ended at ' || sysdate);


END;


/

Still have questions ?

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion via My Oracle Support Streams and Distributed Database Community

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22308399/viewspace-749917/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/22308399/viewspace-749917/

你遇到的错误是 Oracle 数据库在启动过程中无法找到或打开控制文件(control file),导致数据库无法 `mount`,最终启动失败。以下是详细的分析与解决方案。 --- ### 🔍 错误日志分析 ```bash ORA-00202: control file: '/u01/app/oracle/fast_recovery_area/ORCL/control02.ctl' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory ``` - **ORA-00202**:指明了控制文件路径。 - **Linux Error: 2**:表示系统层面找不到该文件(No such file or directory)。 - **ORA-00210**:无法打开指定的控制文件。 - **ORA-205**:在 `alter database mount` 阶段发生,说明控制文件缺失或损坏。 这意味着 Oracle 实例尝试挂载数据库时,发现初始化参数中列出的控制文件之一或全部 **不存在于磁盘上**。 --- ## ✅ 解决方案 ### 步骤 1:确认控制文件路径配置 查看当前实例期望的控制文件路径: ```sql -- 如果实例还能启动到 nomount 状态,可用以下命令: SQL> SHOW PARAMETER control_files; ``` 或者查看 pfile/spfile: ```bash $ cd $ORACLE_HOME/dbs $ strings spfileORCL.ora | grep control_files # 或者查看 init.ora 文件 ``` 典型输出可能为: ``` *.control_files='/u01/app/oracle/oradata/ORCL/control01.ctl','/u01/app/oracle/fast_recovery_area/ORCL/control02.ctl' ``` 这就是 Oracle 寻找控制文件的位置。 --- ### 步骤 2:检查这些文件是否真实存在 运行如下 shell 命令: ```bash ls -l /u01/app/oracle/oradata/ORCL/control01.ctl ls -l /u01/app/oracle/fast_recovery_area/ORCL/control02.ctl ``` 如果返回: ``` ls: cannot access ...: No such file or directory ``` 说明文件确实丢失。 --- ### 步骤 3:恢复控制文件的方法 #### 情况 A:有备份(推荐方式) 如果你使用 RMAN 并且有备份,可以进行恢复: ```bash $ rman target / RMAN> STARTUP NOMOUNT; RMAN> RESTORE CONTROLFILE FROM AUTOBACKUP; -- 或者从特定备份集还原 RMAN> RESTORE CONTROLFILE FROM '/path/to/controlfile_backup.ctl'; RMAN> ALTER DATABASE MOUNT; ``` > ⚠️ 注意:需要开启自动备份或知道确切的备份位置。 #### 情况 B:只有一个副本丢失,另一个仍存在 假设 `control01.ctl` 存在,但 `control02.ctl` 被误删。 你可以: 1. 复制存在的控制文件到缺失路径: ```bash cp /u01/app/oracle/oradata/ORCL/control01.ctl \ /u01/app/oracle/fast_recovery_area/ORCL/control02.ctl chown oracle:oinstall /u01/app/oracle/fast_recovery_area/ORCL/control02.ctl chmod 640 /u01/app/oracle/fast_recovery_area/ORCL/control02.ctl ``` 2. 启动数据库: ```sql SQL> STARTUP NOMOUNT; SQL> ALTER DATABASE MOUNT; ``` 3. (可选)修改参数保留单份或多份: ```sql SQL> ALTER SYSTEM SET control_files='/u01/app/oracle/oradata/ORCL/control01.ctl' SCOPE=SPFILE; ``` 然后重启。 #### 情况 C:所有控制文件都丢失,但你知道结构 — 手动重建 如果没有任何备份,但你知道数据库的大致结构(如数据文件、日志文件名),可以手动创建控制文件。 ##### 示例:重建控制文件 ```sql STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS FORCE LOGGING ARCHIVELOG MAXLOGFILES 16 MAXLOGMEMBERS 3 MAXDATAFILES 100 MAXINSTANCES 1 MAXLOGHISTORY 292 LOGFILE GROUP 1 '/u01/app/oracle/oradata/ORCL/redo01.log' SIZE 50M BLOCKSIZE 512, GROUP 2 '/u01/app/oracle/oradata/ORCL/redo02.log' SIZE 50M BLOCKSIZE 512, GROUP 3 '/u01/app/oracle/oradata/ORCL/redo03.log' SIZE 50M BLOCKSIZE 512 DATAFILE '/u01/app/oracle/oradata/ORCL/system01.dbf', '/u01/app/oracle/oradata/ORCL/sysaux01.dbf', '/u01/app/oracle/oradata/ORCL/undotbs01.dbf', '/u01/app/oracle/oradata/ORCL/pdbseed/system01.dbf', '/u01/app/oracle/oradata/ORCL/pdbseed/sysaux01.dbf', '/u01/app/oracle/oradata/ORCL/users01.dbf' CHARACTER SET AL32UTF8; ``` 然后: ```sql -- 挂载并打开重置日志 ALTER DATABASE OPEN RESETLOGS; ``` 📌 提示:此方法要求你准确知道数据文件和日志文件路径。建议提前通过 `alert.log` 或旧备份获取信息。 --- ### 步骤 4:预防措施 1. **确保多路复用控制文件**: 至少两个不同磁盘路径保存控制文件,防止单点故障。 2. **定期备份控制文件**: ```sql -- 自动备份 RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON; -- 手动生成 ALTER DATABASE BACKUP CONTROLFILE TO TRACE; -- 生成文本版建控件脚本 ALTER DATABASE BACKUP CONTROLFILE TO '/backup/control.bak'; ``` 3. **保存 `CREATE CONTROLFILE` 脚本**: 运行: ```sql ALTER DATABASE BACKUP CONTROLFILE TO TRACE; ``` 输出路径通常在 `user_dump_dest`,是一个 `.trc` 文件,包含重建控制文件的 SQL 模板。 --- ### 🧪 补充诊断命令 ```bash # 查看 alert 日志中更多上下文 tail -100 alert_ORCL.log | grep -i control # 查看参数文件位置 echo $ORACLE_SID strings $ORACLE_HOME/dbs/spfile$ORACLE_SID.ora | head # 查看 dump 目录 SQL> SHOW PARAMETER dump; ``` --- ## ✅ 总结 | 问题 | 原因 | 解法 | |------|------|-------| | 控制文件找不到 | 文件被删除、路径错误、权限问题 | 检查路径、复制恢复、RMAN还原、重建 | 你现在的问题核心是:**控制文件物理缺失**,解决的关键在于是否有备份或能否重建。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值