Oracle 阻塞会话使用redo log排查
Oracle的会话被阻塞之后,虽然说可以通过脚本、trace跟踪、生成ash或者addm查看被阻塞的对象以及sql语句,但是无法获取作为阻塞源头会话的sql语句。其原因是阻塞其他会话的session,在v$session
中status为'INACTIVE'
,此状态意思是没有sql语句正在执行(在等待其他操作),所以sql_id为空,自然在v$sql
中查不到sql text。
下面构建2个会话,模拟事务被阻塞的情景
第一个会话事务执行update不要commit
SQL> col spid for a10
SQL> col tracefile for a80
SQL> set pagesize 200 linesize 200
SQL> select distinct(m.sid),s.serial#,p.spid,p.tracefile from v$mystat m,v$session s,v$process p where m.sid=s.sid and s.paddr=p.addr;
SID SERIAL# SPID TRACEFILE
---------- ---------- ---------- --------------------------------------------------------------------------------
858 247 83223 /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_83223.trc
SQL>
SQL> set time on
14:58:53 SQL> update test set name='aa' where id=1;
1 row updated.
15:02:18 SQL> 没有commit
第二个会话执行update
SQL> set time on
15:08:42 SQL> col spid for a10
15:08:45 SQL> col tracefile for a80
15:08:45 SQL> set pagesize 200 linesize 200
15:08:45 SQL> select distinct(m.sid),s.serial#,p.spid,p.tracefile from v$mystat m,v$session s,v$process p where m.sid=s.sid and s.paddr=p.addr;
SID SERIAL# SPID TRACEFILE
---------- ---------- ---------- --------------------------------------------------------------------------------
27 113 33109 /u01/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_33109.trc
15:08:45 SQL> update test set name='aaa' where id=1;
由于会话1没有commit,会话2被hang住了
脚本查下v$transaction
,v$session
会话信息与事务信息之间的联系
--下面SQL 查询数据库中正在执行大于秒的事务信息,以及sql text,一部分先前执行的sql,以及非对象锁事务
with ltr as (
select to_char(sysdate,'YYYYMMDDHH24MISS') TM,
s.inst_id,
s.sid,
s.serial#,
s.status,
s.sql_id,
s.sql_child_number,
s.prev_sql_id,
xid,
t.start_scn,
to_char(t.start_date,'YYYY-MM-DD HH24:MI:SS') start_time,
e.TYPE,e.block,
e.ctime/3600 runtime_Hour,
decode(e.CTIME, 0, (sysdate - t.start_date) * 3600*24, e.ctime) el_second
-- q.sql_text
from gv$transaction t, gv$session s,gv$transaction_enqueue e
where t.start_date <= sysdate - interval '1' second /*查询开始多少秒的事务*/
and t.addr = s.taddr
--and s.sql_child_number = q.CHILD_NUMBER(+)
--and s.sql_id = q.sql_id(+) and s.prev_sql_id = q.sql_id(+)
and t.addr = e.addr(+) )
select ltr.* , (select q1.sql_text from gv$sql q1 where ltr.prev_sql_id = q1.sql_id(+)
and rownum = 1) prev_sql_text ,
(select q1.sql_text from gv$sql q1 where ltr.sql_id = q1.sql_id(+)
and ltr.sql_child_number = q1.CHILD_NUMBER(+)) sql_text
from ltr ltr;
- 会话1 sid:858,serial#:247,事务开始的start_scn:8308839,start_scn很重要,在redo log里记录了事务起始执行的scn。
- 如上可以看到,sql_text虽然为空,但是数据库
v$session
会记录会话信息中最后一条执行成功的prev_sql_id
,由此可在v$sql
中查找sql_text。不过虽然知道prev_sql_text,但是此sql文本并不能保证就是阻塞事务的sql文本,因为此sql可能会是select或者其他对象DML语句,如select count(*) from dba_users;
- 另外可以发现,只有事务未完成的会话才会在v$transaction中有记录,会话2由于被hang住所以事务还没开始。
还可以查询会话与被锁对象之间的联系,其目的都是判断会话是否被阻塞,或者是因为其他非DML、DDL因素引起的表锁或者事务锁
--判断哪个SESSION执行了DML(Insert/Update/Delete) 但是未提交 (Commit ),引起的行锁
SELECT A.INST_ID,A.SID,A.SERIAL#,A.USERNAME,A.EVENT,A.WAIT_CLASS,A.SECONDS_IN_WAIT,A.PREV_EXEC_START,b.LOCKED_MODE,C.OWNER,C.OBJECT_NAME,C.OBJECT_TYPE
FROM gV$SESSION A
INNER JOIN gV$LOCKED_OBJECT B
ON A.SID=b.SESSION_ID
INNER JOIN DBA_OBJECTS C
ON B.OBJECT_ID=c.OBJECT_ID
WHERE A.WAIT_CLASS='Idle'
AND A.SECONDS_IN_WAIT>1
查询被阻塞的sql
--dml事务查询阻塞与被阻塞,ill阻塞者
select 'alter system kill session ' || chr(39) || bs.sid || ',' ||bs.SERIAL# || ',@' || bs.INST_ID || chr(39) || ';' as "kill blocker on command line",
bs.USERNAME blocking_user,
bs.MACHINE blocking_machine,
bs.program blocking_program,
ws.USERNAME blocked_user_Waiting,
ws.machine blockd_MACHINE_Waiting,
ws.program blockd_program_Waiting,
lw.CTIME Waiting_time_second,
'SID ' ||lh.sid || ',' || bs.SERIAL# || '@' || lh.INST_ID || ' is blocking SID' ||
lw.SID || ',' || ws.serial# || '@' || lw.INST_ID state,
ws.SQL_ID blockd_sql_id,
qw.SQL_TEXT blocked_sql_Wating,
qw.SQL_FULLTEXT blocked_sql_full_text
from gv$lock lh, gv$lock lw, gv$session ws, gv$session bs, gv$sql qw
where
lh.id1 = lw.id1
and lh.id2 = lw.id2
and lw.type in ('TM', 'TX')
and lh.kaddr <> lw.KADDR
and lh.TYPE = lw.TYPE
and lw.LMODE = 0
and lw.INST_ID = ws.INST_ID
and lw.sid = ws.SID
and lh.INST_ID = bs.INST_ID
and lh.SID = bs.SID
and ws.SQL_ID = qw.SQL_ID
and lw.CTIME > 2;
可以查询出阻塞与被阻塞的会话信息
生成addm报告如下:
ADDM Report for Task 'TASK_6684'
--------------------------------
Analysis Period
---------------
AWR snapshot range from 2189 to 2193.
Time period starts at 17-DEC-19 02.00.23 PM
Time period ends at 17-DEC-19 06.00.02 PM
Analysis Target
---------------
Database 'ORCL' with DB ID 1542083228.
Database version 11.2.0.4.0.
ADDM performed an analysis of instance orcl2, numbered 2 and hosted at rac2.
Activity During the Analysis Period
-----------------------------------
Total database time was 10278 seconds.
The average number of active sessions was .71.
Summary of Findings
-------------------
Description Active Sessions Recommendations
Percent of Activity
------------------ ------------------- ---------------
1 Row Lock Waits .71 | 99.83 1
2 Top SQL Statements .71 | 99.8 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
addm中的行锁等待,正是被阻塞的sql
Findings and Recommendations
----------------------------
Finding 1: Row Lock Waits
Impact is .71 active sessions, 99.83% of total activity.
--------------------------------------------------------
SQL statements were found waiting for row lock waits.
Recommendation 1: Application Analysis
Estimated benefit is .71 active sessions, 99.83% of total activity.
-------------------------------------------------------------------
Action
Significant row contention was detected in the TABLE "ADMIN.TEST" with
object ID 93460. Trace the cause of row contention in the application
logic using the given blocked SQL.
Related Object
Database object with ID 93460.
Rationale
The SQL statement with SQL_ID "gs5jbusmq794d" was blocked on row locks.
Related Object
SQL statement with SQL_ID gs5jbusmq794d.
update test set name='aaa' where id=1
Rationale
The session with ID 858 and serial number 247 in instance number 1 was
the blocking session responsible for 100% of this recommendation's
benefit.
----->addm同样能知道被阻塞与阻塞会话者的信息,与前文用脚本查出的阻塞session id一致
Symptoms That Led to the Finding:
---------------------------------
Wait class "Application" was consuming significant database time.
Impact is .71 active sessions, 99.83% of total activity.
Finding 2: Top SQL Statements
Impact is .71 active sessions, 99.8% of total activity.
-------------------------------------------------------
SQL statements consuming significant database time were found. These
statements offer a good opportunity for performance improvement.
Recommendation 1: SQL Tuning
Estimated benefit is .71 active sessions, 99.8% of total activity.
------------------------------------------------------------------
Action
Investigate the UPDATE statement with SQL_ID "gs5jbusmq794d" for
possible performance improvements. You can supplement the information
given here with an ASH report for this SQL_ID.
Related Object
SQL statement with SQL_ID gs5jbusmq794d.
update test set name='aaa' where id=1
Rationale
The SQL spent only 0% of its database time on CPU, I/O and Cluster
waits. Therefore, the SQL Tuning Advisor is not applicable in this case.
Look at performance data for the SQL to find potential improvements.
Rationale
Database time for this SQL was divided as follows: 100% for SQL
execution, 0% for parsing, 0% for PL/SQL execution and 0% for Java
execution.
Rationale
Waiting for event "enq: TX - row lock contention" in wait class
"Application" accounted for 100% of the database time spent in
processing the SQL statement with SQL_ID "gs5jbusmq794d".
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Additional Information
----------------------
Miscellaneous Information
-------------------------
Wait class "Cluster" was not consuming significant database time.
Wait class "Commit" was not consuming significant database time.
Wait class "Concurrency" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
CPU was not a bottleneck for the instance.
Wait class "Network" was not consuming significant database time.
Wait class "User I/O" was not consuming significant database time.
The network latency of the cluster interconnect was within acceptable limits
of 1 milliseconds.
Session connect and disconnect calls were not consuming significant database
time.
Hard parsing of SQL statements was not consuming significant database time.
目前ash视图可以查看历史被阻塞会话的sql_text
以上的脚本以及报告均不能查询出阻塞根源执行的sql语句,只能查出阻塞源session id,由此可以借助logmnr挖取online redo log或者archive log事务信息。
需要注意的是在logmnr挖取日志之前需要在数据库开启最小附加日志
,否则在v$logmnr_contents中session#,serail#,seesion_info
等字段不会有任何信息
由于已经知道了阻塞源头的sid
,serial#
,可以在logmnr视图中根据会话信息查找事务信息
可以看到logmnr挖取的日志与上面脚本查出的start_scn,xid相匹配
另一部分,查询被阻塞的sql在logmnr里并不存在
被阻塞的sql由于未执行成功,所以在redo里并不会记录
现在在会话1提交
18:09:51 SQL> commit;
Commit complete.
18:09:55 SQL>
会话1在sqlplus提交之后,redo log从set transaction read write到commit标志着一个事务的结束,如果redo里没有commit那说明事务还一直在持续。也只有挖取日志才能将事务分析清除。
总结来讲,如果要查出对象类阻塞(DML,DDL)的源头以及有关sql信息,只能通过挖取redo log事务日志的方式,而且需要保证追加日志要开启。在logmnr里根据session id以及事务start_scn可以查询阻塞源头的事务信息。一个事务不能完成往往就是3个原因,事务内的sql执行本身被hang住,被其他事务阻塞、其他事务执行慢