前不久一个客户有个问题咨询,通过微信指导其实处理是较为简单的。 我们先来看看现象。用户的监控告警,cpu持续升高,idle 已经不到10%了,大过年的确实影响玩耍的心情。
从event来看,有个SQL看来是异常的,产生了大量的latch:undo global data等待。
用户发了个awr报告,我们也简单看一眼。
后面确认发现是SQL执行计划存在异常,选择了一个错误的执行计划,执行效率低了1000倍,通过关掉如下几个参数之后,绑定执行计划后,观察很快就恢复了。
alter system set "_sql_plan_directive_mgmt_control"=0 scope=both sid='*';
alter system set "_optimizer_dsdir_usage_control"=0 scope=both sid='*';
alter system set "_optimizer_use_feedback"=false scope=both sid='*';
问题当然没有结束。
我们看用户提供的awr报告,可以看到每秒的逻辑读确实非常高,每秒大约520万。
为什么逻辑度会如此之高呢,实际上从客户后面提供的另外一个正常的awr报告来看,每秒也就200w不到。那么我们来看看awr报告中的instance active statistcs部分。
大家可以看到这里data blocks consistent reads - undo records applied 的指标非常的高。
这也难怪我们看到latch:undo global data这么高了。
那么这个数据是什么含义呢?这里我们简单测一把。
SQL> create table test0208 as select * from dba_objects where rownum <=10;
Table created.
SQL> SELECT dbms_rowid.rowid_object (ROWID) data_object_id,
2 dbms_rowid.rowid_relative_fno (ROWID) relative_fno,
dbms_rowid.rowid_block_number (ROWID) block_no,
dbms_rowid.rowid_row_number (ROWID) row_no,
owner,object_id,object_type
from test0208; 3 4 5 6
DATA_OBJECT_ID RELATIVE_FNO BLOCK_NO ROW_NO OWNER OBJECT_ID OBJECT_TYPE
-------------- ------------ ---------- ---------- -------- ---------- ------------
116368 8 70395 0 SYS 16 TABLE
116368 8 70395 1 SYS 20 TABLE
116368 8 70395 2 SYS 8 CLUSTER
116368 8 70395 3 SYS 37 INDEX
116368 8 70395 4 SYS 22 TABLE
116368 8 70395 5 SYS 33 INDEX
116368 8 70395 6 SYS 40 INDEX
116368 8 70395 7 SYS 31 TABLE
116368 8 70395 8 SYS 41 INDEX
116368 8 70395 9 SYS 3 INDEX
10 rows selected.
SQL> delete from test0208 where object_id=3;
1 row deleted.
SQL>
--session 2
SQL> @stats
SID NAME VALUE
------ ------------------------------------------------------------ ------
2842 consistent gets 54
2842 consistent gets from cache 54
2842 consistent gets pin 46
2842 consistent gets pin (fastpath) 46
2842 consistent gets examination 8
2842 consistent gets examination (fastpath) 8
2842 consistent gets direct 0
2842 fastpath consistent get quota limit 0
2842 consistent changes 0
2842 transaction tables consistent reads - undo records applied 0
2842 transaction tables consistent read rollbacks 0
2842 data blocks consistent reads - undo records applied 0
2842 no work - consistent read gets 42
2842 cleanouts only - consistent read gets 0
2842 rollbacks only - consistent read gets 0
2842 cleanouts and rollbacks - consistent read gets 0
16 rows selected.
SQL> select owner,object_type,count(1) from test0208 group by owner,object_type;
OWNER OBJECT_TYPE COUNT(1)
-------------------- ---------------------------------------------- ----------
SYS TABLE 4
SYS INDEX 5
SYS CLUSTER 1
SQL> @stats
SID NAME VALUE
----- ------------------------------------------------------------ -------
2842 consistent gets 58
2842 consistent gets from cache 58
2842 consistent gets pin 48
2842 consistent gets pin (fastpath) 48
2842 consistent gets examination 10
2842 consistent gets examination (fastpath) 10
2842 consistent gets direct 0
2842 fastpath consistent get quota limit 0
2842 consistent changes 1
2842 transaction tables consistent reads - undo records applied 0
2842 transaction tables consistent read rollbacks 0
2842 data blocks consistent reads - undo records applied 1
2842 no work - consistent read gets 42
2842 cleanouts only - consistent read gets 0
2842 rollbacks only - consistent read gets 0
2842 cleanouts and rollbacks - consistent read gets 1
16 rows selected.
SQL> declare
2 s number;
begin
for i in 1 .. 1000 loop
select count(*) into s from(select owner,object_type,count(1) from test0208 group by owner,object_type);
end loop;
end;
3 4 5 6 7 8 /
PL/SQL procedure successfully completed.
SQL> @stats
SID NAME VALUE
----- ------------------------------------------------------------ --------
2842 consistent gets 4,058
2842 consistent gets from cache 4,058
2842 consistent gets pin 2,048
2842 consistent gets pin (fastpath) 2,048
2842 consistent gets examination 2,010
2842 consistent gets examination (fastpath) 2,010
2842 consistent gets direct 0
2842 fastpath consistent get quota limit 0
2842 consistent changes 1,001
2842 transaction tables consistent reads - undo records applied 0
2842 transaction tables consistent read rollbacks 0
2842 data blocks consistent reads - undo records applied 1,001
2842 no work - consistent read gets 42
2842 cleanouts only - consistent read gets 0
2842 rollbacks only - consistent read gets 0
2842 cleanouts and rollbacks - consistent read gets 1,001
16 rows selected.
SQL>
大家可以看到,实际上我这里就是一个block而已,存在一个未提交事务【实际上就一条dml语句】,select 语句重复之下1000次之后,我们发现:data blocks consistent reads - undo records applied 这个统计数据增加了1000.
所以是实际上,这个数据看上去非常高,并不代表故障期间某个表存在大事务或者说存在未提交的大事务。也可能是个别sql重复执行多次扫描的结果。
今天忙完晚上利用业余时间简单测了下,算是加深一点认识。
最后给自己的来一个小广告!
2024年底开始做了一个数据库训练营【覆盖数据库底层原理、性能分析、故障分析、数据恢复,国产数据库如达梦、GaussDB等等相关内容,算是个人1年的学习记录总结吧】,大约有近30名学员加入其中,大家一起交流学习互动,收获满满!我也将其中的内容全部录制了下来,截至2025春节放假之前已经录制了20个视频,共计时长近50小时。【当然,系列视频仍然会不断录制,预计全部录制完毕大约会有80个视频】。
目前录制的视频接近20GB,估计全部录制完毕将达到80GB;另外配套的视频资料文档都存放在百度云盘,如下一部分:
有兴趣的小伙伴可以添加我的vx进行咨询!
---------------------------------------------------------------------------------------------------------------------------------
喜欢本文的朋友,欢迎关注公众号 Roger的数据库专栏,收看更多精彩内容