11.25号------oracle smon大全很经典-优快云博客

本文详细介绍了 Oracle 数据库系统监控进程 SMON 的多种核心功能，包括清理临时段、合并空闲区间、清理 obj$ 基表、维护 col_usage$ 字典基表、恢复死事务及实例恢复等。并通过实例演示了 SMON 在执行各功能时的行为。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SMON(system monitor process)系统监控后台进程，有时候也被叫做system cleanup process，这么叫的原因是它负责完成很多清理(cleanup)任务。但凡学习过Oracle基础知识的技术人员都会或多或少对该background process的功能有所了解。

曾几何时对SMON功能的了解程度可以作为评判一位DBA理论知识的重要因素，至今仍有很多公司在DBA面试中会问到SMON有哪些功能这样的问题。首先这是一道开放式的题目，并不会奢求面试者能够打全(答全几乎是不可能的，即便是在你阅读本篇文章之后)，答出多少可以作为知识广度的评判依据(如果面试人特意为这题准备过，那么也很好，说明他已经能系统地考虑问题了)，接着还可以就具体的某一个功能说开去，来了解面试者的知识深度，当然这扯远了。

我们所熟知的SMON是个兢兢业业的家伙，它负责完成一些列系统级别的任务。与PMON(Process Monitor)后台进程不同的是，SMON负责完成更多和整体系统相关的工作，这导致它会去做一些不知名的”累活”，当系统频繁产生这些”垃圾任务”，则SMON可能忙不过来。因此在10g中SMON变得有一点懒惰了，如果它在短期内接收到过多的工作通知(SMON: system monitor process posted)，那么它可能选择消极怠工以便让自己不要过于繁忙(SMON: Posted too frequently, trans recovery disabled)，之后会详细介绍。

SMON的主要作用包括:

1.清理临时段(SMON cleanup temporary segments)

触发场景

很多人错误地理解了这里所说的临时段temporary segments，认为temporary segments是指temporary tablespace临时表空间上的排序临时段(sort segment)。事实上这里的临时段主要指的是永久表空间(permanent tablespace)上的临时段，当然临时表空间上的temporary segments也是由SMON来清理(cleanup)的，但这种清理仅发生在数据库实例启动时(instance startup)。

永久表空间上同样存在临时段，譬如当我们在某个永久表空间上使用create table/index等DDL命令创建某个表/索引时，服务进程一开始会在指定的永久表空间上分配足够多的区间(Extents)，这些区间在命令结束之前都是临时的(Temporary Extents)，直到表/索引完全建成才将该temporary segment转换为permanent segment。另外当使用drop命令删除某个段时，也会先将该段率先转换为temporary segment，之后再来清理该temporary segment(DROP object converts the segment to temporary and then cleans up the temporary segment)。常规情况下清理工作遵循谁创建temporary segment，谁负责清理的原则。换句话说，因服务进程rebuild index所产生的temporary segment在rebuild完成后应由服务进程自行负责清理。一旦服务进程在成功清理temporary segment之前就意外终止了，亦或者服务进程在工作过程中遇到了某些ORA-错误导致语句失败，那么SMON都会被要求(posted)负责完成temporary segment的清理工作。

对于永久表空间上的temporary segment，SMON会三分钟清理一次(前提是接到post)，如果SMON过于繁忙那么可能temporary segment长期不被清理。temporary segment长期不被清理可能造成一个典型的问题是:在rebuild index online失败后，后续执行的rebuild index命令要求之前产生的temporary segment已被cleanup，如果cleanup没有完成那么就需要一直等下去。在10gR2中我们可以使用dbms_repair.online_index_clean来手动清理online index rebuild的遗留问题:

The dbms_repair.online_index_clean function has been created to cleanup online index rebuilds.
Use the dbms_repair.online_index_clean function to resolve the issue.
Please note if you are unable to run the dbms_repair.online_index_clean function it is due to the fact
that you have not installed the patch for Bug 3805539 or are not running on a release that includes this fix.
The fix for this bug is a new function in the dbms_repair package called dbms_repair.online_index_clean,
which has been created to cleanup online index [[sub]partition] [re]builds.

New functionality is not allowed in patchsets;
therefore, this is not available in a patchset but is available in 10gR2.

Check your patch list to verify the database is patched for Bug 3805539
using the following command and patch for the bug if it is not listed:

opatch lsinventory -detail

Cleanup after a failed online index [re]build can be slow to occurpreventing subsequent such operations
until the cleanup has occured.

接着我们通过实践来看一下smon是如何清理永久表空间上的temporary segment的:

设置10500事件以跟踪smon进程，这个诊断事件后面会介绍

SQL> alter system set events '10500 trace name context forever,level 10';
System altered.

在第一个会话中执行create table命令，这将产生一定量的Temorary Extents

SQL> create table smon as select * from ymon;

在另一个会话中执行对DBA_EXTENTS视图的查询，可以发现产生了多少临时区间

SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

COUNT(*)
----------
117

终止以上create table的session，等待一段时间后观察smon后台进程的trc可以发现以下信息:

*** 2011-06-07 21:18:39.817
SMON: system monitor process posted msgflag:0x0200 (-/-/-/-/TMPSDROP/-/-)

*** 2011-06-07 21:18:39.818
SMON: Posted, but not for trans recovery, so skip it.

*** 2011-06-07 21:18:39.818
SMON: clean up temp segments in slave

SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

COUNT(*)
----------
0

可以看到smon通过slave进程完成了对temporary segment的清理

与永久表空间上的临时段不同，出于性能的考虑临时表空间上的Extents并不在操作(operations)完成后立即被释放和归还。相反，这些Temporary Extents会被标记为可用，以便用于下一次的排序操作。SMON仍会清理这些Temporary segments，但这种清理仅发生在实例启动时(instance startup):

For performance issues, extents in TEMPORARY tablespaces are not released ordeallocated
once the operation is complete.Instead, the extent is simply marked as available for the next sort operation.
SMON cleans up the segments at startup.

A sort segment is created by the first statement that used a TEMPORARY tablespacefor sorting, after startup.
A sort segment created in a TEMPOARY tablespace is only released at shutdown.
The large number of EXTENTS is caused when the STORAGE clause has been incorrectly calculated.

现象

可以通过以下查询了解数据库中Temporary Extent的总数，在一定时间内比较其总数，若有所减少那么说明SMON正在清理Temporary segment

SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

也可以通过v$sysstat视图中的”SMON posted for dropping temp segment”事件统计信息来了解SMON收到清理要求的情况:

SQL> select name,value from v$sysstat where name like '%SMON%';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
total number of times SMON posted                                         8
SMON posted for undo segment recovery                                     0
SMON posted for txn recovery for other instances                          0
SMON posted for instance recovery                                         0
SMON posted for undo segment shrink                                       0
SMON posted for dropping temp segment                                     1

另外在清理过程中SMON会长期持有Space Transacton(ST)队列锁，其他会话可能因为得不到ST锁而等待超时出现ORA-01575错误:

01575, 00000, "timeout waiting for space management resource"
// *Cause: failed to acquire necessary resource to do space management.
// *Action: Retry the operation.

如何禁止SMON清理临时段

可以通过设置诊断事件event=’10061 trace name context forever, level 10′禁用SMON清理临时段(disable SMON from cleaning temp segments)。

alter system set events '10061 trace name context forever, level 10';

了解你所不知道的SMON功能(二):合并空闲区间

June 7, 2011 By Maclean Liu 1 Comment

SMON的作用还包括合并空闲区间(coalesces free extent)

触发场景

早期Oracle采用DMT字典管理表空间，不同于今时今日的LMT本地管理方式，DMT下通过对FET$和UET$2张字典基表的递归操作来管理区间。SMON每5分钟(SMON wakes itself every 5 minutes and checks for tablespaces with default pctincrease != 0)会自发地去检查哪些默认存储参数pctincrease不等于0的字典管理表空间，注意这种清理工作是针对DMT的，而LMT则无需合并。SMON对这些DMT表空间上的连续相邻的空闲Extents实施coalesce操作以合并成一个更大的空闲Extent，这同时也意味着SMON需要维护FET$字典基表。

现象

以下查询可以检查数据库中空闲Extents的总数，如果这个总数在持续减少那么说明SMON正在coalesce free space：

SELECT COUNT(*) FROM DBA_FREE_SPACE;

在合并区间时SMON需要排他地(exclusive)持有ST(Space Transaction)队列锁，其他会话可能因为得不到ST锁而等待超时出现ORA-01575错误。同时SMON可能在繁琐的coalesce操作中消耗100%的CPU。

如何禁止SMON合并空闲区间

可以通过设置诊断事件event=’10269 trace name context forever, level 10′来禁用SMON合并空闲区间(Don’t do coalesces of free space in SMON)

10269, 00000, "Don't do coalesces of free space in SMON"
// *Cause:    setting this event prevents SMON from doing free space coalesces

alter system set events '10269 trace name context forever, level 10';

了解你所不知道的SMON功能(三):清理obj$基表

June 8, 2011 By Maclean Liu 4 Comments

SMON的作用还包括清理obj$数据字典基表(cleanup obj$)

OBJ$字典基表是Oracle Bootstarp启动自举的重要对象之一:

SQL> set linesize 80 ;
SQL> select sql_text from bootstrap$ where sql_text like 'CREATE TABLE OBJ$%';

SQL_TEXT
--------------------------------------------------------------------------------
CREATE TABLE OBJ$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"OWNER#" NUMBER NOT N
ULL,"NAME" VARCHAR2(30) NOT NULL,"NAMESPACE" NUMBER NOT NULL,"SUBNAME" VARCHAR2(
30),"TYPE#" NUMBER NOT NULL,"CTIME" DATE NOT NULL,"MTIME" DATE NOT NULL,"STIME"
DATE NOT NULL,"STATUS" NUMBER NOT NULL,"REMOTEOWNER" VARCHAR2(30),"LINKNAME" VAR
CHAR2(128),"FLAGS" NUMBER,"OID$" RAW(16),"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3
" NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE
10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE (  INITIAL 16K NEXT 1024K MINEXTEN
TS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 18 EXTENTS (FILE 1 BLOCK 121))

触发场景

OBJ$基表是一张低级数据字典表，该表几乎对库中的每个对象(表、索引、包、视图等)都包含有一行记录。很多情况下，这些条目所代表的对象是不存在的对象(non-existent)，引起这种现象的一种可能的原因是对象本身已经被从数据库中删除了，但是对象条目仍被保留下来以满足消极依赖机制(negative dependency)。因为这些条目的存在会导致OBJ$表不断膨胀，这时就需要由SMON进程来删除这些不再需要的行。SMON会在实例启动(after startup of DB is started cleanup function again)时以及启动后的每12个小时执行一次清理任务(the cleanup is scheduled to run after startup and then every 12 hours)。

我们可以通过以下演示来了解SMON清理obj$的过程:

SQL>  BEGIN
  2      FOR i IN 1 .. 5000 LOOP
  3      execute immediate ('create synonym gustav' || i || ' for
  4  perfstat.sometable');
  5      execute immediate ('drop   synonym gustav' || i );
  6      END LOOP;
  7    END;
  8    /

PL/SQL procedure successfully completed.

SQL> startup force;
ORACLE instance started.

Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.
Database opened.

SQL>   select count(*) from user$ u, obj$ o
  2        where u.user# (+)=o.owner# and o.type#=10 and not exists
  3        (select p_obj# from dependency$ where p_obj# = o.obj#);

  COUNT(*)
----------
      5000

SQL> /

  COUNT(*)
----------
      5000

SQL> /

  COUNT(*)
----------
      4951

SQL> oradebug setospid 18457;
Oracle pid: 8, Unix process pid: 18457, image: oracle@rh2.oracle.com (SMON)

SQL> oradebug event 10046 trace name context forever ,level 1;
Statement processed.

SQL> oradebug tracefile_name;
/s01/admin/G10R2/bdump/g10r2_smon_18457.trc

select o.owner#,
       o.obj#,
       decode(o.linkname,
              null,
              decode(u.name, null, 'SYS', u.name),
              o.remoteowner),
       o.name,
       o.linkname,
       o.namespace,
       o.subname
  from user$ u, obj$ o
 where u.use r#(+) = o.owner#
   and o.type# = :1
   and not exists
 (select p_obj# from dependency$ where p_obj# = o.obj#)
 order by o.obj#
   for update

select null
  from obj$
 where obj# = :1
   and type# = :2
   and obj# not in
       (select p_obj# from dependency$ where p_obj# = obj$.obj#)

delete from obj$ where obj# = :1

/* 删除过程其实较为复杂，可能要删除多个字典基表上的记录 */

现象

我们可以通过以下查询来了解obj$基表中NON-EXISTENT对象的条目总数(type#=10)，若这个总数在不断减少说明smon正在执行清理工作

    select trunc(mtime), substr(name, 1, 3) name, count(*)
      from obj$
     where type# = 10
       and not exists (select * from dependency$ where obj# = p_obj#)
     group by trunc(mtime), substr(name, 1, 3);

      select count(*)
        from user$ u, obj$ o
       where u.user#(+) = o.owner#
         and o.type# = 10
         and not exists
       (select p_obj# from dependency$ where p_obj# = o.obj#);

如何禁止SMON清理obj$基表

我们可以通过设置诊断事件event=’10052 trace name context forever’来禁止SMON清理obj$基表，当我们需要避免SMON因cleanup obj$的相关代码而意外终止或spin从而开展进一步的诊断时可以设置该诊断事件。在Oracle并行服务器或RAC环境中，也可以设置该事件来保证只有特定的某个节点来执行清理工作。

10052, 00000, "don't clean up obj$"

alter system set events '10052 trace name context forever, level 65535';

Problem Description: We are receiving the below warning during db startup:
WARNING: kqlclo() has detected the following :
Non-existent object 37336 NOT deleted because an object
of the same name exists already.
Object name: PUBLIC.USER$

This is caused by the SMON trying to cleanup the SYS.OJB$.
SMON cleans all dropped objects which have a SYS.OBJ$.TYPE#=10.
This can happen very often when you create an object that have the same name as a public synonym.

When SMON is trying to remove non-existent objects and fails because there are duplicates,
multiple nonexistent objects with same name.
This query will returned many objects with same name under SYS schema:

select o.name,u.user# from user$ u, obj$ o where u.user# (+)=o.owner# and o.type#=10
and not exists (select p_obj# from dependency$ where p_obj# = o.obj#);

To cleanup this message:

Take a full backup of the database - this is crucial. If anything goes wrong during this procedure,
your only option would be to restore from backup, so make sure you have a good backup before proceeding.
We suggest a COLD backup. If you plan to use a HOT backup, you will have to restore point in time if any problem happens

Normally DML against dictionary objects is unsupported,
but in this case we know exactly what the type of corruption,
also you are instructing to do this under guidance from Support.

Data dictionary patching must be done by an experienced DBA.
This solution is unsupported.
It means that if there were problems after applying this solution, a database backup must be restored.

1. Set event 10052 at parameter file to disable cleanup of OBJ$ by SMON

EVENT="10052 trace name context forever, level 65535"

2. Startup database in restricted mode

3. Delete from OBJ$, COMMIT

SQL> delete from obj$ where (name,owner#) in ( select o.name,u.user# from user$ u, obj$ o
where u.user# (+)=o.owner# and o.type#=10 and not exists (select p_obj# from
dependency$ where p_obj# = o.obj#) );

SQL> commit;

SQL> Shutdown abort.

4. remove event 10052 from init.ora

5. Restart the database and monitor for the message in the ALERT LOG file

了解你所不知道的SMON功能(四):维护col_usage$字典基表

June 12, 2011 By Maclean Liu 1 Comment

SMON的作用还包括维护col_usage$列监控统计信息基表。

最早在9i中引入了col_usage$字典基表，其目的在于监控column在SQL语句作为predicate的情况，col_usage$的出现完善了CBO中柱状图自动收集的机制。

create table col_usage$
(
  obj#              number,                                 /* object number */
  intcol#           number,                        /* internal column number */
  equality_preds    number,                           /* equality predicates */
  equijoin_preds    number,                           /* equijoin predicates */
  nonequijoin_preds number,                        /* nonequijoin predicates */
  range_preds       number,                              /* range predicates */
  like_preds        number,                         /* (not) like predicates */
  null_preds        number,                         /* (not) null predicates */
  timestamp         date      /* timestamp of last time this row was changed */
)
  storage (initial 200K next 100k maxextents unlimited pctincrease 0)
/
create unique index i_col_usage$ on col_usage$(obj#,intcol#)
  storage (maxextents unlimited)
/

在10g中我们默认使用’FOR ALL COLUMNS SIZE AUTO’的柱状图收集模式，而在9i中默认是’SIZE 1′即默认不收集柱状图，这导致许多9i中正常运行的应用程序在10g中CBO执行计划异常，详见<dbms_stats收集模式在9i和10g上的区别>;。’SIZE AUTO’意为由Oracle自动决定是否收集柱状图及柱状图的桶数，Oracle自行判断的依据就来源于col_usage$字典基表，若表上的某一列曾在硬解析(hard parse)过的SQL语句中充当过predicate(通俗的说就是where后的condition)的话，我们认为此列上有收集柱状图的必要，那么col_usage$上就会被加入该列曾充当predicate的记录。当DBMS_STATS.GATHER_TABLE_STATS存储过程以’SIZE AUTO’模式执行时，收集进程会检查col_usage$基表以判断哪些列之前曾充当过predicate，若充当过则说明该列有收集柱状图的价值。

SMON会每15分钟将shared pool中的predicate columns的数据刷新到col_usage$基表中(until periodically about every 15 minutes SMON flush the data into the data dictionary)，另外当instance shutdown时SMON会扫描col_usage$并找出已被drop表的相关predicate columns记录，并删除这部分”orphaned”孤儿记录。

我们来具体了解col_usage$的填充过程:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com

SQL> create table maclean (t1 int);
Table created.

SQL> select object_id from dba_objects where object_name='MACLEAN';

 OBJECT_ID
----------
   1323013

SQL> select * from maclean where t1=1;

no rows selected

SQL> set linesize 200 pagesize 2000;

注意col_usage$的数据同*_tab_modifications类似，
从查询到数据刷新到col_usage$存在一段时间的延迟，
所以我们立即查询col_usage$将得不到任何记录，
可以手动执行DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO将缓存中的信息刷新到字典上

SQL> select * from col_usage$ where obj#=1323013;
no rows selected

SQL> oradebug setmypid;
Statement processed.

针对FLUSH_DATABASE_MONITORING_INFO填充操作做10046 level 12 trace

SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.



SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
PL/SQL procedure successfully completed.

SQL> select * from col_usage$ where obj#=1323013;

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
   1323013          1              1              0                 0           0          0          0 19-AUG-11

=============10046 trace content====================

lock table sys.col_usage$ in exclusive mode nowait

在测试中可以发现10.2.0.4上DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO存储过程会优先使用
lock in exclusive mode nowait来锁住col_usage$基表，
如果lock失败则会反复尝试1100次，
若仍不能锁住col_usage$表则放弃更新col_usage$上的数据，避免造成锁等待和死锁。


Cksxm.c
Monitor Modification Hash Table Base
modification hash table entry
modification hash table chunk
monitoring column usage element
ksxmlock_1
lock table sys.col_usage$ in exclusive mode
lock table sys.col_usage$ in exclusive mode nowait

update sys.col_usage$
   set equality_preds    = equality_preds +
                           decode(bitand(:flag, 1), 0, 0, 1),
       equijoin_preds    = equijoin_preds +
                           decode(bitand(:flag, 2), 0, 0, 1),
       nonequijoin_preds = nonequijoin_preds +
                           decode(bitand(:flag, 4), 0, 0, 1),
       range_preds       = range_preds + decode(bitand(:flag, 8), 0, 0, 1),
       like_preds        = like_preds + decode(bitand(:flag, 16), 0, 0, 1),
       null_preds        = null_preds + decode(bitand(:flag, 32), 0, 0, 1),
       timestamp         = :time
 where obj# = :ob jn
   and intcol# = :coln

insert into sys.col_usage$
  (obj#,
   intcol#,
   equality_preds,
   equijoin_preds,
   nonequijoin_preds,
   range_preds,
   like_preds,
   null_preds,
   timestamp)
values
  (:objn,
   :coln,
   decode(bitand(:flag, 1), 0, 0, 1),
   decode(bitand(:flag, 2), 0, 0, 1),
   decode(bitand(:flag, 4), 0, 0, 1),
   decode(bitand(:flag, 8), 0, 0, 1),
   decode(bitand(:flag, 16), 0, 0, 1),
   decode(bitand(:flag, 32), 0, 0, 1),
   :time)

使用dbms_stats的’SIZE AUTO’模式收集表上的统计信息会首先参考col_usage$中的predicate columns记录:

SQL> begin
  2
  3    dbms_stats.gather_table_stats(ownname    => 'SYS',
  4                                  tabname    => 'MACLEAN',
  5                                  method_opt => 'FOR ALL COLUMNS SIZE AUTO');
  6  end;
  7  /

PL/SQL procedure successfully completed.

============10046 level 12 trace content======================
SELECT /*+ ordered use_nl(o c cu h) index(u i_user1) index(o i_obj2)
               index(ci_obj#) index(cu i_col_usage$)
               index(h i_hh_obj#_intcol#) */
 C.NAME COL_NAME,
 C.TYPE# COL_TYPE,
 C.CHARSETFORM COL_CSF,
 C.DEFAULT$ COL_DEF,
 C.NULL$ COL_NULL,
 C.PROPERTY COL_PROP,
 C.COL # COL_UNUM,
 C.INTCOL# COL_INUM,
 C.OBJ# COL_OBJ,
 C.SCALE COL_SCALE,
 H.BUCKET_CNT H_BCNT,
 (T.ROWCNT - H.NULL_CNT) / GREATEST(H.DISTCNT, 1) H_PFREQ,
 C.LENGTH COL_LEN,
 CU.TIMES TAMP CU_TIME,
 CU.EQUALITY_PREDS CU_EP,
 CU.EQUIJOIN_PREDS CU_EJP,
 CU.RANGE_PREDS CU_RP,
 CU.LIKE_PREDS CU_LP,
 CU.NONEQUIJOIN_PREDS CU_NEJP,
 CU.NULL_PREDS NP
  FROM SYS.USE        R$ U,
       SYS.OBJ$       O,
       SYS.TAB$       T,
       SYS.COL$       C,
       SYS.COL_USAGE$ CU,
       SYS.HIST_HEAD$ H
 WHERE :B3 = '0'
   AND U.NAME = :B2
   AND O.OWNER# = U.USER#
   AND O.TYPE# = 2
   AND O.NAME = :B1
   AND O.OBJ# = T.OBJ#
   AND O.OBJ# = C.OBJ#
   AND C.OBJ# = CU.OBJ#(+)
   AND C.INTCOL# = CU.INTCOL#(+)
   AND C.OBJ# = H.OBJ#(+)
   AND C.INTCOL# = H.INTCOL#(+)
UNION ALL
SELECT /*+
ordered use_nl(c) */
 C.KQFCONAM COL_NAME,
 C.KQFCODTY COL_TYPE,
 DECODE(C.KQFCODTY, 1, 1, 0) COL_CSF,
 NULL COL_DEF,
 0 COL_NULL,
 0 COL_PROP,
 C.KQFCOCNO COL_UNUM,
 C.KQFCOC NO COL_INUM,
 O.KQFTAOBJ COL_OBJ,
 DECODE(C.KQFCODTY, 2, -127, 0) COL_SCALE,
 H.BUCKET_CNT H_BCNT,
 (ST.ROWCNT - NULL_CNT) / GREATEST(H.DISTCNT, 1) H_PFREQ,
 DECODE(C.KQFCODTY, 2, 22, C.KQFCOSIZ) COL_LEN,
 CU.TIMESTAMP CU_TIME,
 CU.EQUALITY_PREDS CU_EP,
 CU.EQUIJOIN_PREDS CU_EJP,
 CU.RANGE_PREDS CU_RP,
 CU.LIKE_PREDS CU_LP,
 CU.NONEQUIJOIN_PREDS CU _NEJP,
 CU.NULL_PREDS NP
  FROM SYS.X$KQFTA    O,
       SYS.TAB_STATS$ ST,
       SYS.X$KQFCO    C,
       SYS.COL_USAGE$ CU,
       SYS.HIST_HEAD$ H
 WHERE :B3 != '0'
   AND :B2 = 'SYS'
   AND O.KQFTANAM = :B1
   AND O.KQFTAOBJ = ST.OBJ#(+)
   AND O.KQFTAOBJ = C.KQFCOTOB
   AND C.KQFCOTOB = CU.OBJ#(+)
   AND C.KQFCOCNO = CU.INTCOL#(+)
   AND C.KQFCOTOB = H.OBJ#(+)
   AND C.KQFCOCNO = H.INTCO L#(+)

现象

根据Metalink Note<Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort [ID 332177.1]>:

Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort [ID 332177.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.4.0
This problem can occur on any platform.
Symptoms

The database is not shutting down for a considerable time when you issue the command :
shutdown immediate

To shut it down in a reasonable time you have to issue the command
shutdown abort

To collect some diagnostics before issuing the shutdown immediate command set a trace event as follows:

Connect as SYS (/ as sysdba)

SQL> alter session set events '10046 trace name context forever,level 12';

SQL> shutdown immediate;

In the resultant trace file (within the udump directory) you see something similar to the following :-

PARSING IN CURSOR #n
delete from sys.col_usage$ c where not exists   (select 1 from sys.obj$ o where o.obj# = c.obj# )

...followed by loads of.....

WAIT #2: nam='db file sequential read' ela= 23424 p1=1 p2=4073 p3=1
....
WAIT #2: nam='db file scattered read' ela= 1558 p1=1 p2=44161 p3=8

etc

Then eventually

WAIT #2: nam='log file sync' ela= 32535 p1=4111 p2=0 p3=0

...some other SQL....then back to

WAIT #2: nam='db file sequential read' ela= 205 p1=1 p2=107925 p3=1
WAIT #2: nam='db file sequential read' ela= 1212 p1=1 p2=107926 p3=1
WAIT #2: nam='db file sequential read' ela= 212 p1=1 p2=107927 p3=1
WAIT #2: nam='db file scattered read' ela= 1861 p1=1 p2=102625 p3=8
etc....

To verify which objects are involved here you can use a couple of the P1 & P2 values from above
:-

a) a sequential read
SELECT owner,segment_name,segment_type
FROM dba_extents
WHERE file_id=1
AND 107927 BETWEEN block_id AND block_id + blocks

b) a scattered read
SELECT owner,segment_name,segment_type
FROM dba_extents
WHERE file_id=1
AND 102625 BETWEEN block_id AND block_id + blocks

The output confirms that the objects are

SYS.I_COL_USAGE$  (INDEX)   and   SYS.COL_USAGE$ (TABLE)

Finally, issue select count(*) from sys.col_usage$;

Cause

If the number of entries in sys.col_usage$ is large then you are very probably hitting the issue raised in

Bug: 3540022 9.2.0.4.0 RDBMS Base Bug 3221945
Abstract: CLEAN-UP OF ENTRIES IN COL_USAGE$

Base Bug 3221945 9.2.0.3 RDBMS
Abstract: ORA-1631 ON COL_USAGE$

Closed as "Not a Bug"

However, when a table is dropped, the column usage statistics are not dropped. They are left as they are.
When the database is shutdown (in normal mode), then these "orphaned" column usage entries are deleted. The code
which does this gets called only during normal shutdown.

Unless and until the database is shutdown, the col_usage$ table will continue to grow.
Solution
To implement the workaround, please execute the following steps:

1. Periodically (eg once a day) run exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO will clean out redundant col_usage$ entries, and when
you come to shutdown the database you should not have a huge number of entries left to clean up.

该文档指出了在shutdown instance时SMON会着手清理col_usage$中已被drop表的相关predicate columns的”orphaned”记录，如果在本次实例的生命周期中曾生成大量最后被drop的中间表，那么col_usage$中已经堆积了众多的”orphaned”记录，SMON为了完成cleanup工作需要花费大量时间导致shutdown变慢。这个文档还指出定期执行DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO也可以清理col_usage$中的冗余记录。

我们来观察一下SMON的清理工作:

begin
  for i in 1 .. 5000 loop
    execute immediate 'create table maclean1' || i ||' tablespace fragment as select 1 t1 from dual';
    execute immediate 'select * from maclean1' || i || ' where t1=1';
  end loop;
  DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
  for i in 1 .. 5000 loop
    execute immediate 'drop table maclean1' || i;
  end loop;
end;
/

SQL> purge dba_recyclebin;
DBA Recyclebin purged.

我们可以通过以下查询了解col_usage$上的orphaned记录总数，这也将是在instance shutdown时
SMON所需要清理的数目

  select count(*)
    from sys.col_usage$ c
   where not exists (select /*+ unnest */
           1
            from sys.obj$ o
           where o.obj# = c.obj#);

  COUNT(*)
----------
     10224

针对SMON做10046 level 12 trace

SQL> oradebug setospid 30225;
Oracle pid: 8, Unix process pid: 30225, image: oracle@rh2.oracle.com (SMON)

SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.

SQL> shutdown immediate;

=================10046 trace content==================
lock table sys.col_usage$ in exclusive mode nowait

delete from sys.col_usage$ where obj#= :1 and intcol#= :2

delete from sys.col_usage$ c
 where not exists (select /*+ unnest */
         1
          from sys.obj$ o
         where o.obj# = c.obj#)

如何禁止SMON维护col_usage$字典基表

1.设置隐藏参数_column_tracking_level(column usage tracking)，该参数默认为1即启用column使用情况跟踪。设置该参数为0，将禁用column tracking，该参数可以在session和system级别动态修改:

SQL> col name for a25
SQL> col DESCRIB for a25
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%_column_tracking_level%';

NAME                      VALUE      DESCRIB
------------------------- ---------- -------------------------
_column_tracking_level    1          column usage tracking

SQL> alter session set "_column_tracking_level"=0 ;
Session altered.

SQL> alter system set "_column_tracking_level"=0 scope=both;
System altered.

2.关闭DML monitoring，可以通过设置隐藏参数_dml_monitoring_enabled(enable modification monitoring)为false实现，disable dml monitoring对CBO的影响较大，所以我们一般推荐上一种方式:

SQL> SELECT monitoring, count(*) from DBA_TABLES group by monitoring;

MON   COUNT(*)
--- ----------
NO          79
YES       2206

SQL> alter system set "_dml_monitoring_enabled"=false;
System altered.

SQL> SELECT monitoring, count(*) from DBA_TABLES group by monitoring;

MON   COUNT(*)
--- ----------
NO        2285

实际上dba_tables的monitoring列来源于内部参数_dml_monitoring_enabled

SQL> set long 99999

SQL> select text from dba_views where view_name='DBA_TABLES';

TEXT
--------------------------------------------------------------------------------
select u.name, o.name, decode(bitand(t.property,2151678048), 0, ts.name, null),
       decode(bitand(t.property, 1024), 0, null, co.name),
       decode((bitand(t.property, 512)+bitand(t.flags, 536870912)),
              0, null, co.name),
       decode(bitand(t.trigflag, 1073741824), 1073741824, 'UNUSABLE', 'VALID'),
       decode(bitand(t.property, 32+64), 0, mod(t.pctfree$, 100), 64, 0, null),
       decode(bitand(ts.flags, 32), 32, to_number(NULL),
          decode(bitand(t.property, 32+64), 0, t.pctused$, 64, 0, null)),
       decode(bitand(t.property, 32), 0, t.initrans, null),
       decode(bitand(t.property, 32), 0, t.maxtrans, null),
       s.iniexts * ts.blocksize,
       decode(bitand(ts.flags, 3), 1, to_number(NULL),
                                      s.extsize * ts.blocksize),
       s.minexts, s.maxexts,
       decode(bitand(ts.flags, 3), 1, to_number(NULL),
                                      s.extpct),
       decode(bitand(ts.flags, 32), 32, to_number(NULL),
         decode(bitand(o.flags, 2), 2, 1, decode(s.lists, 0, 1, s.lists))),
       decode(bitand(ts.flags, 32), 32, to_number(NULL),
         decode(bitand(o.flags, 2), 2, 1, decode(s.groups, 0, 1, s.groups))),
       decode(bitand(t.property, 32+64), 0,
                decode(bitand(t.flags, 32), 0, 'YES', 'NO'), null),
       decode(bitand(t.flags,1), 0, 'Y', 1, 'N', '?'),
       t.rowcnt,
       decode(bitand(t.property, 64), 0, t.blkcnt, null),
       decode(bitand(t.property, 64), 0, t.empcnt, null),
       t.avgspc, t.chncnt, t.avgrln, t.avgspc_flb,
       decode(bitand(t.property, 64), 0, t.flbcnt, null),
       lpad(decode(t.degree, 32767, 'DEFAULT', nvl(t.degree,1)),10),
       lpad(decode(t.instances, 32767, 'DEFAULT', nvl(t.instances,1)),10),
       lpad(decode(bitand(t.flags, 8), 8, 'Y', 'N'),5),
       decode(bitand(t.flags, 6), 0, 'ENABLED', 'DISABLED'),
       t.samplesize, t.analyzetime,
       decode(bitand(t.property, 32), 32, 'YES', 'NO'),
       decode(bitand(t.property, 64), 64, 'IOT',
               decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',
               decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null
))),
       decode(bitand(o.flags, 2), 0, 'N', 2, 'Y', 'N'),
       decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N'),
       decode(bitand(t.property, 8192), 8192, 'YES',
              decode(bitand(t.property, 1), 0, 'NO', 'YES')),
       decode(bitand(o.flags, 2), 2, 'DEFAULT',
             decode(s.cachehint, 0, 'DEFAULT', 1, 'KEEP', 2, 'RECYCLE', NULL)),
       decode(bitand(t.flags, 131072), 131072, 'ENABLED', 'DISABLED'),
       decode(bitand(t.flags, 512), 0, 'NO', 'YES'),
       decode(bitand(t.flags, 256), 0, 'NO', 'YES'),
       decode(bitand(o.flags, 2), 0, NULL,
          decode(bitand(t.property, 8388608), 8388608,
                 'SYS$SESSION', 'SYS$TRANSACTION')),
       decode(bitand(t.flags, 1024), 1024, 'ENABLED', 'DISABLED'),
       decode(bitand(o.flags, 2), 2, 'NO',
           decode(bitand(t.property, 2147483648), 2147483648, 'NO',
              decode(ksppcv.ksppstvl, 'TRUE', 'YES', 'NO'))),
       decode(bitand(t.property, 1024), 0, null, cu.name),
       decode(bitand(t.flags, 8388608), 8388608, 'ENABLED', 'DISABLED'),
       decode(bitand(t.property, 32), 32, null,
                decode(bitand(s.spare1, 2048), 2048, 'ENABLED', 'DISABLED')),
       decode(bitand(o.flags, 128), 128, 'YES', 'NO')
from sys.user$ u, sys.ts$ ts, sys.seg$ s, sys.obj$ co, sys.tab$ t, sys.obj$ o,
     sys.obj$ cx, sys.user$ cu, x$ksppcv ksppcv, x$ksppi ksppi
where o.owner# = u.user#
  and o.obj# = t.obj#
  and bitand(t.property, 1) = 0
  and bitand(o.flags, 128) = 0
  and t.bobj# = co.obj# (+)
  and t.ts# = ts.ts#
  and t.file# = s.file# (+)
  and t.block# = s.block# (+)
  and t.ts# = s.ts# (+)
  and t.dataobj# = cx.obj# (+)
  and cx.owner# = cu.user# (+)
  and ksppi.indx = ksppcv.indx
  and ksppi.ksppinm = '_dml_monitoring_enabled'

了解你所不知道的SMON功能(五):Recover Dead transaction

June 24, 2011 By Maclean Liu 7 Comments

SMON的作用还包括清理死事务:Recover Dead transaction。当服务进程在提交事务(commit)前就意外终止的话会形成死事务(dead transaction)，PMON进程负责轮询Oracle进程，找出这类意外终止的死进程(dead process)，通知SMON将与该dead process相关的dead transaction回滚清理，并且PMON还负责恢复dead process原本持有的锁和latch。

我们来具体了解dead transaction的恢复过程:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select  * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com

SQL>alter system set fast_start_parallel_rollback=false;
System altered.

设置10500，10046事件以跟踪SMON进程的行为

SQL> alter system set events '10500 trace name context forever,level 8';
System altered.

SQL> oradebug setospid 4424
Oracle pid: 8, Unix process pid: 4424, image: oracle@rh2.oracle.com (SMON)

SQL> oradebug event 10046 trace name context forever,level 8;
Statement processed.

在一个新的terminal中执行大批量的删除语句，在执行一段时间后使用操作系统命令将执行该删除操作的
服务进程kill掉，模拟一个大的dead transaction的场景

SQL> delete large_rb;
delete large_rb

[oracle@rh2 bdump]$ kill -9 4535

等待几秒后pmon进程会找出dead process:
[claim lock for dead process][lp 0x7000003c70ceff0][p 0x7000003ca63dad8.1290666][hist x9a514951]

在x$ktube内部视图中出现ktuxecfl(Transaction flags)标记为DEAD的记录:

SQL> select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';

SUM(DISTINCT(KTUXESIZ))
-----------------------
                  29386

SQL> /

SUM(DISTINCT(KTUXESIZ))
-----------------------
                  28816

以上KTUXESIZ代表事务所使用的undo块总数(number of undo blocks used by the transaction)

==================smon trace content==================
SMON: system monitor process posted
WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332801935
WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332815568
WAIT #0: nam='latch: row cache objects' ela= 95 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333332734
WAIT #0: nam='latch: row cache objects' ela= 83 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333356173
WAIT #0: nam='latch: undo global data' ela= 104 address=3066991984 number=187 tries=1 obj#=1 tim=1278243347987705
WAIT #0: nam='latch: object queue header operation' ela= 89 address=3094817048 number=131 tries=0 obj#=1 tim=1278243362468042
WAIT #0: nam='log file switch (checkpoint incomplete)' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243419588202
Dead transaction 0x00c2.008.0000006d recovered by SMON

=====================
PARSING IN CURSOR #3 len=358 dep=1 uid=0 oct=3 lid=0 tim=1278243423594568 hv=3186851936 ad='ae82c1b8'
select smontabv.cnt,
       smontab.time_mp,
       smontab.scn,
       smontab.num_mappings,
       smontab.tim_scn_map,
       smontab.orig_thread
  from smon_scn_time smontab,
       (select max(scn) scnmax,
               count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt
          from smon_scn_time
         where thread = 0) smontabv
 where smontab.scn = smontabv.scnmax
   and thread = 0

END OF STMT
PARSE #3:c=0,e=1354526,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1278243423594556
EXEC #3:c=0,e=106,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1278243423603269
FETCH #3:c=0,e=47065,p=0,cr=319,cu=0,mis=0,r=1,dep=1,og=4,tim=1278243423650375
*** 2011-06-24 21:19:25.899
WAIT #0: nam='smon timer' ela= 299999999 sleep time=300 failed=0 p3=0 obj#=1 tim=1278243716699171
kglScanDependencyHandles4Unpin():
  cumscan=3 cumupin=4 time=776 upinned=0

以上SMON回滚清理Dead transaction的过程从”system monitor process posted”开始到”Dead transaction 0x00c2.008.0000006d recovered by SMON”结束。另外可以看到在恢复过程中SMON先后请求了’latch: row cache objects’、’latch: undo global data’、’latch: object queue header operation’三种不同类型的latch。

现象

fast_start_parallel_rollback参数决定了SMON在回滚事务时使用的并行度，若将该参数设置为false那么并行回滚将被禁用，若设置为Low(默认值)那么会以2*CPU_COUNT数目的并行度回滚，当设置为High则4*CPU_COUNT数目的回滚进程将参与进来。当我们通过以下查询发现系统中存在大的dead tranacation需要回滚时我们可以通过设置fast_start_parallel_rollback为HIGH来加速恢复:

select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';

==============parallel transaction recovery===============

*** 2011-06-24 20:31:01.765
SMON: system monitor process posted msgflag:0x0000 (-/-/-/-/-/-/-)

*** 2011-06-24 20:31:01.765
SMON: process sort segment requests begin

*** 2011-06-24 20:31:01.765
SMON: process sort segment requests end

*** 2011-06-24 20:31:01.765
SMON: parallel transaction recovery begin
WAIT #0: nam='DFS lock handle' ela= 504 type|mode=1413545989 id1=3 id2=11 obj#=2 tim=1308918661765715
WAIT #0: nam='DFS lock handle' ela= 346 type|mode=1413545989 id1=3 id2=12 obj#=2 tim=1308918661766135
WAIT #0: nam='DFS lock handle' ela= 565 type|mode=1413545989 id1=3 id2=13 obj#=2 tim=1308918661766758
WAIT #0: nam='DFS lock handle' ela= 409 type|mode=1413545989 id1=3 id2=14 obj#=2 tim=1308918661767221
WAIT #0: nam='DFS lock handle' ela= 332 type|mode=1413545989 id1=3 id2=15 obj#=2 tim=1308918661767746
WAIT #0: nam='DFS lock handle' ela= 316 type|mode=1413545989 id1=3 id2=16 obj#=2 tim=1308918661768146
WAIT #0: nam='DFS lock handle' ela= 349 type|mode=1413545989 id1=3 id2=17 obj#=2 tim=1308918661768549
WAIT #0: nam='DFS lock handle' ela= 258 type|mode=1413545989 id1=3 id2=18 obj#=2 tim=1308918661768858
WAIT #0: nam='DFS lock handle' ela= 310 type|mode=1413545989 id1=3 id2=19 obj#=2 tim=1308918661769224
WAIT #0: nam='DFS lock handle' ela= 281 type|mode=1413545989 id1=3 id2=20 obj#=2 tim=1308918661769555

*** 2011-06-24 20:31:01.769
SMON: parallel transaction recovery end

但是在real world的实践中可以发现当fast_start_parallel_rollback= Low/High，即启用并行回滚时常有并行进程因为各种资源互相阻塞导致回滚工作停滞的例子，当遭遇到这种问题时将fast_start_parallel_rollback设置为FALSE一般可以保证恢复工作以串行形式在较长时间内完成。

如何禁止SMON Recover Dead transaction

可以设置10513事件来临时禁止SMON恢复死事务，这在我们做某些异常恢复的时候显得异常有效，当然不建议在一个正常的生产环境中设置这个事件:

SQL> alter system set events '10513 trace name context forever, level 2';

System altered.

10531 -- event disables transaction recovery which was initiated by SMON

SQL> select ktuxeusn,
  2         to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS') "Time",
  3         ktuxesiz,
  4         ktuxesta
  5    from x$ktuxe
  6   where ktuxecfl = 'DEAD';

  KTUXEUSN Time                         KTUXESIZ KTUXESTA
---------- -------------------------- ---------- ----------------
        17 24-JUN-2011 22:03:10                0 INACTIVE
        66 24-JUN-2011 22:03:10                0 INACTIVE
       105 24-JUN-2011 22:03:10                0 INACTIVE
       193 24-JUN-2011 22:03:10            33361 ACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       197 24-JUN-2011 22:03:10            20171 ACTIVE

7 rows selected.

SQL> /

  KTUXEUSN Time                         KTUXESIZ KTUXESTA
---------- -------------------------- ---------- ----------------
        17 24-JUN-2011 22:03:10                0 INACTIVE
        66 24-JUN-2011 22:03:10                0 INACTIVE
       105 24-JUN-2011 22:03:10                0 INACTIVE
       193 24-JUN-2011 22:03:10            33361 ACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       197 24-JUN-2011 22:03:10            20171 ACTIVE

7 rows selected.

================smon disabled trans recover trace==================

SMON: system monitor process posted
*** 2011-06-24 22:02:57.980
SMON: Event 10513 is level 2, trans recovery disabled.

了解你所不知道的SMON功能(六):Instance Recovery

June 27, 2011 By Maclean Liu 1 Comment

SMON的作用还包括RAC环境中的Instance Recovery，注意虽然Instance Recovery可以翻做实例恢复，但实际上和我们口头所说的实例恢复是不同的。我们口头语言所说的实例恢复很大程度上是指Crash Recovery崩溃恢复，Instance Recovery与Crash Recovery是存在区别的：针对单实例(single instance)或者RAC中所有节点全部崩溃后的恢复，我们称之为Crash Recovery。而对于RAC中的某一个节点失败，存活节点(surviving instance)试图对失败节点线程上redo做应用的情况，我们称之为Instance Recovery。对于Crash Recovery更多的内容可见<还原真实的cache recovery>一文。

现象

Instance Recovery期间分别存在cache recovery和ges/gcs remaster2个recovery stage,注意这2个舞台的恢复是同时进行的。cache recovery的主角是存活节点上的SMON进程，SMON负责分发redo给slave进程。而实施ges/gcs remaster的是RAC专有进程LMON。

整个Reconfiuration的过程如下图: 注意以上Crash Detected时数据库进入部分可用(Partial Availability)状态，从Freeze Lockdb开始None Availability，到IR applies redo即前滚时转换为Partial Availability，待前滚完成后会实施回滚，但是此时数据库已经进入完全可用(Full Availability)状态了，如下图：

The graphic illustrates the degree of database availability during each step of Oracle instance recovery:

A. Real Application Clusters is running on multiple nodes.

B. Node failure is detected.

C. The enqueue part of the GRD is reconfigured; resource management is redistributed to the surviving nodes. This operation occurs relatively quickly.

D. The cache part of the GRD is reconfigured and SMON reads the redo log of the failed instance to identify the database blocks that it needs to recover.

E. SMON issues the GRD requests to obtain all the database blocks it needs for recovery. After the requests are complete, all other blocks are accessible.

F. The Oracle server performs roll forward recovery. Redo logs of the failed threads are applied to the database, and blocks are available right after their recovery is completed.

G. The Oracle server performs rollback recovery. Undo blocks are applied to the database for all uncommitted transactions.

H. Instance recovery is complete and all data is accessible.

Note: The dashed line represents the blocks identified in step 2 in the previous slide. Also, the dotted steps represent the ones identified in the previous slide.

我们来实际观察一下Instance Recovery的过程:

INST 1:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com

SQL> alter system set event='10426 trace name context forever,level 12' scope=spfile;  -- 10426 event Reconfiguration trace event
System altered.

SQL> startup force;
ORACLE instance started.

INST 2:
SQL> shutdown abort
ORACLE instance shut down.

=============================================================
========================alert.log============================

Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 (myinst: 1)
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
 parallel recovery started with 2 processes                 --2 recovery slave
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Started redo scan
Completed redo scan
 read 88 KB redo, 82 data blocks need recovery
Started redo application at
 Thread 2: logseq 374, block 2, scn 54624376
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0
  Mem# 0: +DATA/prod/onlinelog/group_4.271.747100549
  Mem# 1: +DATA/prod/onlinelog/group_4.272.747100553
Completed redo application of 0.07MB
Completed instance recovery at
 Thread 2: logseq 374, block 178, scn 54646382
 73 data blocks read, 83 data blocks written, 88 redo k-bytes read
Thread 2 advanced to log sequence 375 (thread recovery)
Redo thread 2 internally disabled at seq 375 (SMON)
ARC3: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (thread 2 sequence 374) (PROD1)
Setting Resource Manager plan SCHEDULER[0x310B]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (PROD1)
2011-06-27 22:19:29.280000 +08:00
Archived Log entry 792 added for thread 2 sequence 374 ID 0x9790ab2 dest 1:
ARC0: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (thread 2 sequence 375) (PROD1)
2011-06-27 22:19:30.336000 +08:00
ARC0: Archiving disabled thread 2 sequence 375
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (PROD1)
Archived Log entry 793 added for thread 2 sequence 375 ID 0x9790ab2 dest 1:
minact-scn: Master considers inst:2 dead

==================================================================
===========================smon trace begin=======================

*** 2011-06-27 22:19:28.279
2011-06-27 22:19:28.279849 : Start recovery for domain=0, valid=0, flags=0x0
Successfully allocated 2 recovery slaves
Using 67 overflow buffers per recovery slave
Thread 2 checkpoint: logseq 374, block 2, scn 54624376
  cache-low rba: logseq 374, block 2
    on-disk rba: logseq 374, block 178, scn 54626382
  start recovery at logseq 374, block 2, scn 54624376
Instance recovery not required for thread 1

*** 2011-06-27 22:19:28.487
Started writing zeroblks thread 2 seq 374 blocks 178-185

*** 2011-06-27 22:19:28.487
Completed writing zeroblks thread 2 seq 374
==== Redo read statistics for thread 2 ====
Total physical reads (from disk and memory): 4096Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 88Kb in 0.18s => 0.48 Mb/sec
Longest record: 8Kb, moves: 0/186 (0%)
Longest LWN: 33Kb, moves: 0/47 (0%), moved: 0Mb
Last redo scn: 0x0000.0341884d (54626381)
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 248/330 = 0.8
----------------------------------------------

*** 2011-06-27 22:19:28.489
KCRA: start recovery claims for 82 data blocks

*** 2011-06-27 22:19:28.526
KCRA: blocks processed = 82/82, claimed = 81, eliminated = 1
2011-06-27 22:19:28.526088 : Validate domain 0
 **************** BEGIN RECOVERY HA STATS  ****************

 I'm the recovery instance

 smon posted (1278500359646), recovery started 0.027 secs,(1278500359673)
 domain validated 0.242 secs (1278500359888)
 claims opened 70, claims converted 11, claims preread 0

 ****************  END RECOVERY HA STATS  *****************
2011-06-27 22:19:28.526668 : Validated domain 0, flags = 0x0

*** 2011-06-27 22:19:28.556
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0

*** 2011-06-27 22:19:28.560
Completed redo application of 0.07MB

*** 2011-06-27 22:19:28.569
Completed recovery checkpoint
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 330/330 = 1.0
----------------------------------------------
*** 2011-06-27 22:19:28.572 5401 krsg.c
Acquiring RECOVERY INFO PING latch from [krsg.c:5401] IX0
*** 2011-06-27 22:19:28.572 5401 krsg.c
Successfully acquired RECOVERY INFO PING latch IX+
*** 2011-06-27 22:19:28.572 5406 krsg.c
Freeing RECOVERY INFO PING latch from [krsg.c:5406] IX0
*** 2011-06-27 22:19:28.572 5406 krsg.c
Successfully freed RECOVERY INFO PING latch IX-
krss_sched_work: Prod archiver request from process SMON (function:0x2000)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Evaluating ARC2 to receive message (flags 0x2)
krss_find_arc: Selecting ARC2 to receive REC PING message
*** 2011-06-27 22:19:28.572 3093 krsv.c
krsv_send_msg: Sending message to process ARC2
*** 2011-06-27 22:19:28.572 1819 krss.c
krss_send_arc: Sent message to ARC2 (message:0x2000)
Recovery sets nab of thread 2 seq 374 to 178 with 8 zeroblks
Retrieving log 4
pre-aal: xlno:4 flno:0 arf:0 arb:2 arh:2 art:4
Updating log 3 thread 2 sequence 375
Previous log 3 thread 2 sequence 0
Updating log 4 thread 2 sequence 374
Previous log 4 thread 2 sequence 374
post-aal: xlno:4 flno:0 arf:3 arb:2 arh:2 art:3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Selecting ARC3 to receive message
*** 2011-06-27 22:19:28.589 3093 krsv.c
krsv_send_msg: Sending message to process ARC3
*** 2011-06-27 22:19:28.589 1819 krss.c
krss_send_arc: Sent message to ARC3 (message:0x1)
Retrieving log 2
Kicking thread 1 to switch logfile
Retrieving log 4
Retrieving log 3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Selecting ARC0 to receive message
*** 2011-06-27 22:19:28.599 3093 krsv.c
krsv_send_msg: Sending message to process ARC0
*** 2011-06-27 22:19:28.599 1819 krss.c
krss_send_arc: Sent message to ARC0 (message:0x1)
*** 2011-06-27 22:19:28.599 838 krsv.c
krsv_dpga: Waiting for pending I/O to complete

*** 2011-06-27 22:19:29.304
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Selecting ARC1 to receive message
*** 2011-06-27 22:19:29.304 3093 krsv.c
krsv_send_msg: Sending message to process ARC1
*** 2011-06-27 22:19:29.304 1819 krss.c
krss_send_arc: Sent message to ARC1 (message:0x1)
SMON[INST-TXN-RECO]:about to recover undo segment 11 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 11 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 12 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 12 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 13 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 13 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 14 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 14 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 15 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 15 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 16 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 16 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 17 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 17 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 18 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 18 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 19 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 19 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 20 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 20 as available status:2 ret:0

*** 2011-06-27 22:19:43.299
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1
* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:22:18.333
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:24:53.365
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

========================================================================
==============================lmon trace begin==========================

*** 2011-06-27 22:19:27.748
kjxgmpoll reconfig instance map: 1

*** 2011-06-27 22:19:27.748
kjxgmrcfg: Reconfiguration started, type 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 4 0.

*** 2011-06-27 22:19:27.759
     Name Service frozen
kjxgmcs: Setting state to 4 1.
kjxgrdecidever: No old version members in the cluster
kjxgrssvote: reconfig bitmap chksum 0x2137452d cnt 1 master 1 ret 0
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSVOTE: Master indicates no Disk Voting
2011-06-27 22:19:27.760783 : kjxgrDiskVote: nonblocking method is chosen
kjxggpoll: change poll time to 50 ms
2011-06-27 22:19:27.918847 : kjxgrDiskVote: Obtained RR update lock for sequence 5, RR seq 4
2011-06-27 22:19:28.023160 : kjxgrDiskVote: derive membership from CSS (no disk votes)
2011-06-27 22:19:28.023240 : proposed membership: 1

*** 2011-06-27 22:19:28.081
2011-06-27 22:19:28.081952 : kjxgrDiskVote: new membership is updated by inst 1, seq 6
2011-06-27 22:19:28.082073 : kjxgrDiskVote: bitmap: 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmmeminfo: can not invalidate inst 2
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmSendAbortInstMsg: send an abort message to instance 2
 kjfmuin: inst bitmap 1
 kjfmmhi: received msg from inst 1 (inc 2)
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
     Name Service normal
     Name Service recovery done

*** 2011-06-27 22:19:28.191
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxgmcs: total reconfig time 0.432 seconds (from 2895072218 to 2895072650)
kjxggpoll: change poll time to 600 ms
kjfmact: call ksimdic on instance (2)
2011-06-27 22:19:28.211846 :

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 22:19:28.211906 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 22:19:28.211954 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
2011-06-27 22:19:28.212394 : * Begin lmon rcfg step KJGA_RCFG_FREEZE

*** 2011-06-27 22:19:28.233
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 1,
  (rcvinst 1), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 22:19:28.233575 :  Global Resource Directory frozen
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully
inst 1
* kjfcrfg: single_instance_kjga = TRUE

 asby init, 0/1/x2

 asby returns, 0/1/x2/false

* Domain maps before reconfiguration:
*   DOMAIN 0 (valid 1): 1 2
* End of domain mappings

 * dead instance detected - domain 0 invalid = TRUE

* Domain maps after recomputation:
*   DOMAIN 0 (valid 0): 1
* End of domain mappings
2011-06-27 22:19:28.235110 : * Begin lmon rcfg step KJGA_RCFG_COMM
2011-06-27 22:19:28.235242 : GSIPC:KSXPCB: msg 0xd8b84550 status 34, type 2, dest 2, rcvr 0
2011-06-27 22:19:28.235339 : GSIPC:KSXPCB: msg 0xd8b80180 status 34, type 2, dest 2, rcvr 1
 Active Sendback Threshold = 50 %
 Communication channels reestablished
2011-06-27 22:19:28.240076 : * Begin lmon rcfg step KJGA_RCFG_EXCHANGE
2011-06-27 22:19:28.240192 : * Begin lmon rcfg step KJGA_RCFG_ENQCLEANUP
 Master broadcasted resource hash value bitmaps
2011-06-27 22:19:28.251474 :
 Non-local Process blocks cleaned out
2011-06-27 22:19:28.251822 : * Begin lmon rcfg step KJGA_RCFG_CLEANUP
2011-06-27 22:19:28.265220 : * Begin lmon rcfg step KJGA_RCFG_TIMERQ
2011-06-27 22:19:28.265308 : * Begin lmon rcfg step KJGA_RCFG_DDQ
2011-06-27 22:19:28.265393 : * Begin lmon rcfg step KJGA_RCFG_SETMASTER
2011-06-27 22:19:28.271551 :
 Set master node info
2011-06-27 22:19:28.271931 : * Begin lmon rcfg step KJGA_RCFG_ENQREPLAY
2011-06-27 22:19:28.275490 :  Submitted all remote-enqueue requests
2011-06-27 22:19:28.275596 : * Begin lmon rcfg step KJGA_RCFG_ENQDUBIOUS
 Dwn-cvts replayed, VALBLKs dubious
2011-06-27 22:19:28.277223 : * Begin lmon rcfg step KJGA_RCFG_ENQGRANT
 All grantable enqueues granted
2011-06-27 22:19:28.277992 : * Begin lmon rcfg step KJGA_RCFG_PCMREPLAY
2011-06-27 22:19:28.279234 :
2011-06-27 22:19:28.279255 :  Post SMON to start 1st pass IR               --SMON posted by LMON
2011-06-27 22:19:28.307890 :  Submitted all GCS cache requests             --IR acquires all gcs resource needed for recovery
2011-06-27 22:19:28.308038 : * Begin lmon rcfg step KJGA_RCFG_FIXWRITES
 Post SMON to start 1st pass IR
 Fix write in gcs resources
2011-06-27 22:19:28.313508 : * Begin lmon rcfg step KJGA_RCFG_END
2011-06-27 22:19:28.313720 :
2011-06-27 22:19:28.313733 :
Reconfiguration complete

*   domain 0 valid?: 0
* kjfcrfg: ask RMS0 to do pnp work
 **************** BEGIN DLM RCFG HA STATS  ****************

 Total dlm rcfg time (inc 6): 0.100 secs (1278500359581, 1278500359681)
   Begin step .........: 0.001 secs (1278500359581, 1278500359582)
   Freeze step ........: 0.020 secs (1278500359582, 1278500359602)
   Remap step .........: 0.002 secs (1278500359602, 1278500359604)
   Comm step ..........: 0.005 secs (1278500359604, 1278500359609)
   Sync 1 step ........: 0.000 secs (0, 0)
   Exchange step ......: 0.000 secs (1278500359609, 1278500359609)
   Sync 2 step ........: 0.000 secs (0, 0)
   Enqueue cleanup step: 0.011 secs (1278500359609, 1278500359620)
   Sync pcm1 step .....: 0.000 secs (0, 0)
   Cleanup step .......: 0.013 secs (1278500359620, 1278500359633)
   Timerq step ........: 0.000 secs (1278500359633, 1278500359633)
   Ddq step ...........: 0.000 secs (1278500359633, 1278500359633)
   Set master step ....: 0.006 secs (1278500359633, 1278500359639)
   Sync 3 step ........: 0.000 secs (0, 0)
   Enqueue replay step : 0.004 secs (1278500359639, 1278500359643)
   Sync 4 step ........: 0.000 secs (0, 0)
   Enqueue dubious step: 0.001 secs (1278500359643, 1278500359644)
   Sync 5 step ........: 0.000 secs (0, 0)
   Enqueue grant step .: 0.001 secs (1278500359644, 1278500359645)
   Sync 6 step ........: 0.000 secs (0, 0)
   PCM replay step ....: 0.030 secs (1278500359645, 1278500359675)
   Sync 7 step ........: 0.000 secs (0, 0)
   Fixwrt replay step .: 0.003 secs (1278500359675, 1278500359678)
   Sync 8 step ........: 0.000 secs (0, 0)
   End step ...........: 0.001 secs (1278500359680, 1278500359681)
 Number of replayed enqueues sent / received .......: 0 / 0
 Number of replayed fusion locks sent / received ...: 0 / 0
 Number of enqueues mastered before / after rcfg ...: 2217 / 2941
 Number of fusion locks mastered before / after rcfg: 3120 / 5747

 ****************  END DLM RCFG HA STATS  *****************

*** 2011-06-27 22:19:36.589
kjxgfipccb: msg 0x0x7ff526139320, mbo 0x0x7ff526139310, type 19, ack 0, ref 0, stat 34

=====================================================================
============================lms trace begin==========================


*** 2011-06-27 22:38:54.663
2011-06-27 22:38:54.663764 :  0 GCS shadows cancelled, 0 closed, 0 Xw survived
2011-06-27 22:38:54.673539 :  5230 GCS resources traversed, 0 cancelled
2011-06-27 22:38:54.707671 :  9322 GCS shadows traversed, 0 replayed, 0 duplicates,
5183 not replayed, dissolve 0 timeout 0 RCFG(10) lms 0 finished replaying gcs resources
2011-06-27 22:38:54.709132 :  0 write requests issued in 384 GCS resources        --check past image
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709520 :  0 write requests issued in 273 GCS resources
 1 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709842 :  0 write requests issued in 281 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710159 :  0 write requests issued in 233 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710531 :  0 write requests issued in 350 GCS resources
 lms 0 finished fixing gcs write protocol

Instance Recovery和普通的Crash Recovery最大的区别在于实例恢复过程中的GRD Frozen和对GES/GCS资源的Remaster，这部分工作主要由LMON进程完成，可以从以上trace中发现一些KJGA_RCFG_*形式的Reconfiguration步骤，它们的含义:

Reconfiguration Steps:

1.    KJGA_RCFG_BEGIN
LMON continuously polling for reconfiguration event. Once cgs reports a change in cluster membership,
LMON starts reconfiguration

2.    KJGA_RCFG_FREEZE
All processes acknowledges to the reconfiguration freeze before LMON continue 

3.    KJGA_RCFG_REMAP
Updates new instance map (kjfchsu), re-distributes resource mastership. Invalidate recovery domains
if reconfiguration is caused by instance death.

4.    KJGA_RCFG_COMM
Reinitialize communication channel

5.    KJGA_RCFG_EXCHANGE
Exchange of master information of gcs, ges and file affinity master

6.    KJGA_RCFG_ENQCLEANUP
Delete remote dead gcs/ges locks. Cancel converting gcs requests. 

7.    KJGA_RCFG_CLEANUP
Cleanup/remove ges resources

8.    KJGA_RCFG_TIMERQ
Restore relative timeout for enqueue locks on timeout queue

9.    KJGA_RCFG_DDQ
Clean out enqueue locks on deadlock queue

10.  KJGA_RCFG_SETMASTER
Update master info for each enqueue resources that need to be remastered.

11.  KJGA_RCFG_REPLAY
Replay enqueue locks

12.  KJGA_RCFG_ENQDUBIOUS
Invalidates ges resources without established value

13.  KJGA_RCFG_ENQGRANT
Grants all grantable ges lock requests

14.  KJGA_RCFG_REPLAY2
Enqueue reconfiguration complete. Post SMON to start instance recovery.  Starts replaying gcs resources.

15.  KJGA_RCFG_FIXWRITES2
Fix write state of gcs resources

16.  KJGA_RCFG_END
Unfreeze lock database

Instance Recovery相关的诊断事件
我们无法禁止Instance Recovery的发生，事实上一旦出现Instance Crash那么Instance Recovery就是必须的。
与Instance Recovery相关的诊断事件主要有10426和29717等:

    10426 – Reconfiguration trace event
    10425 – Enqueue operations
    10432 – Fusion activity
    10429 – IPC tracing

oerr ora 10425
10425, 00000, "enable global enqueue operations event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global enqueue operations.

oerr ora 10426
10426, 00000, "enable ges/gcs reconfiguration event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs reconfiguration.

oerr ora 10430
10430, 00000, "enable ges/gcs dynamic remastering event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs dynamic remastering.

oerr ora 10401
10401, 00000, "turn on IPC (ksxp) debugging"
// *Cause:
// *Action: Enables debugging code for IPC service layer (ksxp)

oerr ora 10708
10708, 00000, "print out trace information from the RAC buffer cache"
// *Cause: N/A
// *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE.  THIS DOES NOT NEED TO BE
//          TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING.

oerr ora 29717
29717, 00000, "enable global resource directory freeze/unfreeze event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global resource directory freeze/unfreeze.

diag RAC INSTANCE SHUTDOWN LMON
LMON will dump more informations to trace during reconfig and freeze.

event="10426 trace name context forever, level 8"
event="29717 trace name context forever, level 5"
or
event="10426 trace name context forever, level 12"
event="10430 trace name context forever, level 12"
event="10401 trace name context forever, level 8"
event="10046 trace name context forever, level 8"
event="10708 trace name context forever, level 15"
event="29717 trace name context forever, level 5"

see 29717  grd frozen trace

alter system set event='29717 trace name context forever, level 5' scope=spfile;

=========================================================================
============================lmon trace begin=============================

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 23:13:16.693089 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 23:13:16.693219 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
[FDB][start]
2011-06-27 23:13:16.701320 : * Begin lmon rcfg step KJGA_RCFG_FREEZE
[FACK][18711 not frozen]          --fack means acknowledge in advance
[FACK][18713 not frozen]
[FACK][18719 not frozen]
[FACK][18721 not frozen]
[FACK][18723 not frozen]
[FACK][18729 not frozen]
[FACK][18739 not frozen]
[FACK][18743 not frozen]
[FACK][18745 not frozen]
[FACK][18747 not frozen]
[FACK][18749 not frozen]
[FACK][18751 not frozen]
[FACK][18753 not frozen]
[FACK][18755 not frozen]
[FACK][18757 not frozen]
[FACK][18759 not frozen]
[FACK][18763 not frozen]
[FACK][18765 not frozen]
[FACK][18767 not frozen]
[FACK][18769 not frozen]
[FACK][18771 not frozen]
[FACK][18775 not frozen]
[FACK][18777 not frozen]
[FACK][18816 not frozen]
[FACK][18812 not frozen]
[FACK][18818 not frozen]
[FACK][18820 not frozen]
[FACK][18824 not frozen]
[FACK][18826 not frozen]
[FACK][18830 not frozen]
[FACK][18835 not frozen]
[FACK][18842 not frozen]
[FACK][18860 not frozen]
[FACK][18865 not frozen]
[FACK][18881 not frozen]
[FACK][18883 not frozen]
[FACK][18909 not frozen]

*** 2011-06-27 23:13:16.724
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 0,
  (rcvinst 32767), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 23:13:16.724764 :  Global Resource Directory frozen
* kjfc_qry_bigns: noone has the rcvinst established yet, set it to the highest open instance = 1
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully

=====================================================================
==========================lmd0 trace begin===========================

*** 2011-06-27 23:13:16.700
[FFCLI][frozen]
[FUFCLI][normal]