还是帮别人处理的,不知道我为什么运气好,数据库就是不宕
他的数据库HANG住了,SQLPLUS无法登陆,一直HANG在那里
一般情况下,10g处理方法应该为sqlplus -prelim / as sysdba 连接数据库,注意这种方式连接数据库,是无法关闭数据库的,只能进行SYSTEMSTATE
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug dump systemstate 266
SQL> oradebug tracefile_name
获得TRC文件后,可以用ass109.awk来格式化下看看
System State 1
~~~~~~~~~~~~~~~~
1:
2: last wait for 'cursor: mutex X'
3: waiting for 'rdbms ipc message' wait
4: waiting for 'rdbms ipc message' wait
5: waiting for 'rdbms ipc message' wait
6: waiting for 'rdbms ipc message' wait
7: waiting for 'rdbms ipc message' wait
8: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
9: last wait for 'rdbms ipc message'
10: last wait for 'latch: library cache'[Latch 7000001302e0030]
11: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
12: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
13:
14:
15: waiting for 'Streams AQ: qmn slave idle wait' wait
16: waiting for 'rdbms ipc message' wait
17: waiting for 'rdbms ipc message' wait
18: last wait for 'latch: library cache'[Latch 7000001302e0350]
19: for 'Streams AQ: waiting for time management or cleanup tasks' wait
20: waiting for 'Streams AQ: qmn coordinator idle wait' wait
21: waiting for 'SQL*Net message from client' wait
22: [Latch 7000001302e0030]
23: waiting for 'SQL*Net message from client' wait
24: [Latch 7000001302e0030]
25: waiting for 'SQL*Net message from client' wait
26: waiting for 'SQL*Net message from client' wait
27: [Latch 7000000100e5020]
28: waiting for 'latch: library cache' [Latch 7000001302e0030] wait
29: waiting for 'latch: library cache' [Latch 7000001302e0350] wait
30: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
31: last wait for 'latch: library cache'[Latch 7000001302e03f0]
32: waiting for 'SQL*Net message from client' wait
33: last wait for 'latch: library cache'[Latch 7000001302e0030]
34: last wait for 'latch: library cache'[Latch 7000001302e0030]
35: waiting for 'latch: library cache' [Latch 7000001302e0170] wait
36: waiting for 'SQL*Net message from client' wait
................................
735: [Latch 70000001000c240]
736: [Latch 70000001000c240]
737: [Latch 70000001000c240]
738: [Latch 70000001000c240]
739: [Latch 70000001000c240]
740: [Latch 70000001000c240]
Blockers
~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate.
Resource Holder State
Latch 7000001302e0170 ??? Blocker
Latch 7000001302e0030 ??? Blocker
Latch 7000001302e0350 ??? Blocker
Latch 7000000100e5020 ??? Blocker
Latch 7000001302e03f0 ??? Blocker
Latch 7000000100e50c0 302: Self-Deadlock
Latch 700000010020f20 ??? Blocker
Latch 70000001000c240 660: 660: is waiting for 660: 699:
Latch 70000001000c240 699: 699: is waiting for 660: 699:
Object Names
~~~~~~~~~~~~
Latch 7000001302e0170 Child library cache
Latch 7000001302e0030 Child library cache
Latch 7000001302e0350 Child library cache
Latch 7000000100e5020 Child shared pool
Latch 7000001302e03f0 Child library cache
Latch 7000000100e50c0 Child shared pool
Latch 700000010020f20 user lock
Latch 70000001000c240 holding (efd=3) 70000001000c240 OS proces
首先302的Self-Deadlock肯定是有问题的,其持有的LATCH无法释放,那么在SYSTEMSTATE中找 PROCESS 302 看看
PROCESS 302:
----------------------------------------
SO: 700000138541e28, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=302, calls cur/top: 0/7000000746b9fd8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kghalo:
waiting for 7000000100e50c0 Child shared pool level=7 child#=2
Location from where latch is held: kgh: add extent to reserved list:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
97 (936, 1258509061, 2)
46 (936, 1258509061, 2)
82 (933, 1258509061, 2)
114 (930, 1258509061, 2)
108 (930, 1258509061, 2)
127 (930, 1258509061, 2)
....(100的个,这里省略掉)
636 (495, 1258509061, 2)
633 (495, 1258509061, 2)
628 (495, 1258509061, 2)
634 (495, 1258509061, 2)
632 (495, 1258509061, 2)
640 (486, 1258509061, 2)
641 (486, 1258509061, 2)
642 (486, 1258509061, 2)
waiter count=262
gotten 728126853 times wait, failed first 151722 sleeps 66474
gotten 0 times nowait, failed: 0
possible holder pid = 92 spid=929882
on wait list for 7000000100e50c0
acquiring 7000000100e50c0
这里可以看到,其在等待7000000100e50c0,但是其却持有7000000100e50c0,自己把自己搞死掉了
而且,还有大量的SESSION在7000000100e50c0这个LATCH上等待,有100多个
找了个SQLPLUS进去的进程看,确认其也在等待该LATCH
找到了SQLPLUS HANG住的PROCESS 738,确定其在等待70000001000c240
PROCESS 738:
----------------------------------------
SO: 7000001385ad2e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=738, calls cur/top: 0/0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kso_new_process:
waiting for 70000001000c240 OS process: request allocation level=6
Location from where latch is held: kso_new_process:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
643 (523, 1258509101, 3)
644 (523, 1258509101, 0)
645 (523, 1258509101, 3)
该LATCH的持有者为PROCESS 642
PROCESS 642:
----------------------------------------
SO: 7000001385958e8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=642, calls cur/top: 0/0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=40
Location from where call was made: kghalo:
waiting for 7000000100e50c0 Child shared pool level=7 child#=2
Location from where latch is held: kgh: add extent to reserved list:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
97 (970, 1258509095, 3)
46 (970, 1258509095, 0)
82 (967, 1258509095, 3)
114 (964, 1258509095, 3)
108 (964, 1258509095, 3)
......
gotten 728126853 times wait, failed first 151722 sleeps 66474
gotten 0 times nowait, failed: 0
possible holder pid = 92 spid=929882
on wait list for 7000000100e50c0
holding (efd=3) 70000001000c240 OS process: request allocation level=6
Location from where latch is held: kso_new_process:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
643 (517, 1258509095, 0)
644 (517, 1258509095, 0)
645 (517, 1258509095, 0)
该进程也在等待 7000000100e50c0
所以处理方法也很明显了,杀掉PROCESS 302,并附带杀掉660,那个进程也死锁了。还有那些Holder为???的进程
不过,做HANGANALYZE竟然没有发现这个LATCH上的死锁,真奇怪
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/8242091/viewspace-619991/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/8242091/viewspace-619991/