一次数据库HANG处理-优快云博客

本文详细分析了一次Oracle 10g数据库出现的死锁情况，通过获取TRC文件并使用ass109.awk工具解析，发现了持有共享池闩锁的进程导致自我死锁的问题。文中还记录了受影响的SQL*Plus会话及处理建议。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

还是帮别人处理的，不知道我为什么运气好，数据库就是不宕

他的数据库HANG住了，SQLPLUS无法登陆，一直HANG在那里

一般情况下，10g处理方法应该为sqlplus -prelim / as sysdba 连接数据库，注意这种方式连接数据库，是无法关闭数据库的，只能进行SYSTEMSTATE
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug dump systemstate 266
SQL> oradebug tracefile_name

获得TRC文件后，可以用ass109.awk来格式化下看看

System State 1
~~~~~~~~~~~~~~~~
1:
2: last wait for 'cursor: mutex X'
3: waiting for 'rdbms ipc message'     wait
4: waiting for 'rdbms ipc message'     wait
5: waiting for 'rdbms ipc message'     wait
6: waiting for 'rdbms ipc message'     wait
7: waiting for 'rdbms ipc message'     wait
8: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
9: last wait for 'rdbms ipc message'
10: last wait for 'latch: library cache'[Latch 7000001302e0030]
11: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
12: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
13:
14:
15: waiting for 'Streams AQ: qmn slave idle wait' wait
16: waiting for 'rdbms ipc message'     wait
17: waiting for 'rdbms ipc message'     wait
18: last wait for 'latch: library cache'[Latch 7000001302e0350]
19: for 'Streams AQ: waiting for time management or cleanup tasks' wait
20: waiting for 'Streams AQ: qmn coordinator idle wait' wait
21: waiting for 'SQL*Net message from client' wait
22:                                    [Latch 7000001302e0030]
23: waiting for 'SQL*Net message from client' wait
24:                                    [Latch 7000001302e0030]
25: waiting for 'SQL*Net message from client' wait
26: waiting for 'SQL*Net message from client' wait
27:                                    [Latch 7000000100e5020]
28: waiting for 'latch: library cache' [Latch 7000001302e0030] wait
29: waiting for 'latch: library cache' [Latch 7000001302e0350] wait
30: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
31: last wait for 'latch: library cache'[Latch 7000001302e03f0]
32: waiting for 'SQL*Net message from client' wait
33: last wait for 'latch: library cache'[Latch 7000001302e0030]
34: last wait for 'latch: library cache'[Latch 7000001302e0030]
35: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
36: waiting for 'SQL*Net message from client' wait
................................
735:                                   [Latch 70000001000c240]
736:                                   [Latch 70000001000c240]
737:                                   [Latch 70000001000c240]
738:                                   [Latch 70000001000c240]
739:                                   [Latch 70000001000c240]
740:                                   [Latch 70000001000c240]
Blockers
~~~~~~~~

        Above is a list of all the processes. If they are waiting for a resource
        then it will be given in square brackets. Below is a summary of the
        waited upon resources, together with the holder of that resource.
        Notes:
        ~~~~~
         o A process id of '???' implies that the holder was not found in the
           systemstate.

                    Resource Holder State
       Latch 7000001302e0170    ??? Blocker
       Latch 7000001302e0030    ??? Blocker
       Latch 7000001302e0350    ??? Blocker
       Latch 7000000100e5020    ??? Blocker
       Latch 7000001302e03f0    ??? Blocker
       Latch 7000000100e50c0   302: Self-Deadlock
       Latch 700000010020f20    ??? Blocker
       Latch 70000001000c240   660: 660: is waiting for 660: 699:
       Latch 70000001000c240   699: 699: is waiting for 660: 699:

Object Names
~~~~~~~~~~~~
Latch 7000001302e0170   Child library cache
Latch 7000001302e0030   Child library cache
Latch 7000001302e0350   Child library cache
Latch 7000000100e5020   Child shared pool
Latch 7000001302e03f0   Child library cache
Latch 7000000100e50c0   Child shared pool
Latch 700000010020f20   user lock
Latch 70000001000c240         holding    (efd=3) 70000001000c240 OS proces

首先302的Self-Deadlock肯定是有问题的，其持有的LATCH无法释放，那么在SYSTEMSTATE中找 PROCESS 302 看看
PROCESS 302:
----------------------------------------
SO: 700000138541e28, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=302, calls cur/top: 0/7000000746b9fd8, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0
        Location from where call was made: kghalo:
      waiting for 7000000100e50c0 Child shared pool level=7 child#=2
        Location from where latch is held: kgh: add extent to reserved list:
        Context saved from call: 0
        state=busy, wlstate=free
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           97 (936, 1258509061, 2)
           46 (936, 1258509061, 2)
           82 (933, 1258509061, 2)
           114 (930, 1258509061, 2)
           108 (930, 1258509061, 2)
           127 (930, 1258509061, 2)
      ....(100的个，这里省略掉)
           636 (495, 1258509061, 2)
           633 (495, 1258509061, 2)
           628 (495, 1258509061, 2)
           634 (495, 1258509061, 2)
           632 (495, 1258509061, 2)
           640 (486, 1258509061, 2)
           641 (486, 1258509061, 2)
           642 (486, 1258509061, 2)
           waiter count=262
          gotten 728126853 times wait, failed first 151722 sleeps 66474
          gotten 0 times nowait, failed: 0
        possible holder pid = 92 spid=929882
      on wait list for 7000000100e50c0
      acquiring 7000000100e50c0

这里可以看到，其在等待7000000100e50c0，但是其却持有7000000100e50c0，自己把自己搞死掉了

而且，还有大量的SESSION在7000000100e50c0这个LATCH上等待，有100多个

找了个SQLPLUS进去的进程看，确认其也在等待该LATCH

找到了SQLPLUS HANG住的PROCESS 738，确定其在等待70000001000c240
PROCESS 738:
----------------------------------------
SO: 7000001385ad2e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=738, calls cur/top: 0/0, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0
        Location from where call was made: kso_new_process:
      waiting for 70000001000c240 OS process: request allocation level=6
        Location from where latch is held: kso_new_process:
        Context saved from call: 0
        state=busy, wlstate=free
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           643 (523, 1258509101, 3)
           644 (523, 1258509101, 0)
           645 (523, 1258509101, 3)

该LATCH的持有者为PROCESS 642
PROCESS 642:
----------------------------------------
SO: 7000001385958e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=642, calls cur/top: 0/0, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=40
        Location from where call was made: kghalo:
      waiting for 7000000100e50c0 Child shared pool level=7 child#=2
        Location from where latch is held: kgh: add extent to reserved list:
        Context saved from call: 0
        state=busy, wlstate=free
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           97 (970, 1258509095, 3)
           46 (970, 1258509095, 0)
           82 (967, 1258509095, 3)
           114 (964, 1258509095, 3)
           108 (964, 1258509095, 3)
           ......
          gotten 728126853 times wait, failed first 151722 sleeps 66474
          gotten 0 times nowait, failed: 0
        possible holder pid = 92 spid=929882
      on wait list for 7000000100e50c0
      holding    (efd=3) 70000001000c240 OS process: request allocation level=6
        Location from where latch is held: kso_new_process:
        Context saved from call: 0
        state=busy, wlstate=free
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           643 (517, 1258509095, 0)
           644 (517, 1258509095, 0)
           645 (517, 1258509095, 0)