Cache Buffers Chains and Latch Spelunking

本文通过分析RAC环境中缓存缓冲区链锁的活动情况,揭示了繁忙等待背后的原因,并深入探讨了如何定位导致问题的具体表分区及块。

from:http://www.ardentperf.com/2007/09/13/cache-buffers-chains-and-latch-spelunking/

 

 

Last night I posted a case study where I used the AWR (a blessed new feature) to investigate “gc buffer busy” wait events in a RAC environment. I concluded the write-up by theorizing that the single freelist was pointing all nodes of the cluster to the same small group of blocks for inserts and thereby causing the blocks on the freelist to always be subject to unearthly contention across the cluster.

One common piece of advice for gc buffer busy waits is to treat them like regular buffer busy waits. Because essentially that’s what they are – a buffer busy wait on a remote instance. So another avenue of investigation is to look at what might be causing buffer busy waits across the cluster.

Some people may remember that back in the days before YAPP and the wait interface, latches were usually where the purported “experts” looked when you had performance problems. Particularly those two infamous latches cache buffers chains and library cache. And of course today these are still an important part of any in-depth investigation and V$LATCH even includes wait time so you can take a time-based approach to analysis. I spent some time yesterday having a look at the latching in this RAC system and it yielded some results that I thought might be interesting to post. So here goes…

Cache Buffers Chains Latch

Now it was apparent to me pretty quickly that the cache buffers chains latch was the busiest on the system. And of course it’s normal for there to be some contention on this latch. You’ll notice from the screenshot the enormous difference in gets, spin_gets and sleeps. So that gave me a bit of a head start – but you could certainly get the same information from the AWR (that blessed new feature) as well. In fact you can do all of this from the AWR – although look out since latch information is not gathered by default!

So these guys weren’t gathering latch info – and so I couldn’t use the AWR. Instead I just setup my own temporary table to hold the statistics.

create table jschneider.waits as
select 1 snap, systimestamp timestamp, inst_id
,      CHILD#
,      ADDR
,      GETS
,      MISSES
,      SLEEPS
from gv$latch_children
where name = 'cache buffers chains';

Table created.

At this point I worked on something else for awhile. After an hour or so I took a second snapshot.

insert into jschneider.waits
select 2, systimestamp, inst_id
,      CHILD#
,      ADDR
,      GETS
,      MISSES
,      SLEEPS
from gv$latch_children
where name = 'cache buffers chains';

393216 rows created.

commit;

Commit complete.

I guess I should briefly explain why I grabbed that information. The cache buffers chains latch is actually made up of a large number of child latches. When Oracle needs to access the buffer cache it hashes some of the block’s information to discover which child it needs to use. That way each child latch only has a short list of blocks that it is managing. What I aim to do is find out which child latches are the busiest and then see which segments have blocks protected by those latches. Not only that but we’ll be able to see which blocks are the busiest (in real time).

So the next step is to find the child latches suffering from the most contention. In order to do this we’ll look for latches that frequently cause processes to sleep (relinquish the processor) while waiting.

with subq as (
select t2.inst_id i, t2.child#, t2.addr,
  t2.gets-t1.gets gets,
  t2.misses-t1.misses misses,
  t2.sleeps-t1.sleeps sleeps
from jschneider.waits6 t1, jschneider.waits6 t2
where t1.child#=t2.child#
  and t1.inst_id=t2.inst_id
  and t1.snap=1 and t2.snap=2
order by sleeps desc
)
select * from subq
where rownum < 40;

         I     CHILD# ADDR                   GETS     MISSES     SLEEPS
---------- ---------- ---------------- ---------- ---------- ----------
         3      32664 00000005F9EA6780     349526       8642        168
         2      32664 00000005F9EA6780     352351       9850        167
         3      21135 00000005F7D9FC80     109896       1284        146
         3      39340 00000005FBA3AD10      58127       1100        142
         5      13536 00000005A8F41F40    2305516      63418        126
         3      57464 0000000608DD4200      45361        909        112
         3      59297 000000060799DF68      31056        690         94
         3      14549 00000005F9C30B08      46444        624         90
         2      53157 00000006078C8808       4727       1129         83
         3      27008 00000005F6E27820      64685       1148         82
         3      62193 00000005FAD17808      28596        550         78
         3      24183 00000005F6DC54B8      32420        505         77
         3      35290 00000005F7F8BE38      65142       1115         75
         3      46376 0000000606C9CA30      29474        653         74
         3       8428 00000005F6BA1900      67146        303         73
         3      15740 00000005F7CE4388      94209        338         72
         3      30375 00000005F7EE1040       4962        663         69
         2      57464 0000000608DD4200      59033        887         68
         3      21180 00000005F7DA1588      65043        743         68
         3      38233 00000005F6FADC08      31475        516         66
         3      25742 00000005F8DE1520      57828        928         65
         3      46198 00000005FAAEB6D0      61915        751         65
         3       6648 00000005F6B63AE0      32022        462         62
         5      18908 00000005A2E51A30    1480518      39246         62
         3      59340 00000005FBCF2210      77557        307         61
         6      63812 00000005A0CD70E8    1071392      63118         61
         2       2526 00000005F5B16B90     339617        303         60
         1      32664 00000005F9EA6780     234869       5319         57
         2      14849 00000005F9C3B1E8      30541        248         57
         3      11539 00000005F9BC80B8     141915        220         57
         2      16944 00000005F9C83F40       3931        961         56
         4      18475 00000005F7D434E0      87239        477         56
         3      33009 00000005F9EB2768      66883        793         56
         2      59438 00000005FACB7B90      27733        742         52
         2      63600 00000005FBD863B0      10316        188         51
         3      44988 00000005FAAC15C0      28087        208         51
         2      52350 00000005FBBFF1E0     570855       3252         50
         3       1388 00000005F6AACD00     203545        163         50
         3       8972 00000005F8B9A4D0      28815        205         49
Exploring the Buffer Cache

Notice that I’ve also queried all latches across the entire cluster. So this is giving me a system-wide picture of cache buffer chains latches. Interestingly, the top child is the same one on two different instances (2 and 3). So next let’s login locally to instance 3 and see what that latch child is protecting! Also, I’m going to grab the current SCN – but only the base – from dbms_flashback.

select mod(dbms_flashback.get_system_change_number,power(2,32)) cur_scn_base from dual;

CUR_SCN_BASE
------------
   279813816

col object format a55
col state format a5
select /*+rule*/ *
from (
select o.owner||'.'||o.object_name||decode(o.subobject_name,NULL,'','.')||
  o.subobject_name||' ['||o.object_type||']' object, sq.*
from (
select
  x.obj,
  x.file#,x.dbablk,
  x.tch,
  decode(x.state,0,'FREE',1,'XCUR',2,'SCUR',3,'CR',4,'READ',
    5,'MREC',6,'IREC',7,'WRITE',8,'PI',9,'MEMORY',10,'MWRITE',
    11,'DONATED',x.state) state,
  decode(x.state,3,cr_scn_bas,NULL) scn_bas
from
  sys.v$latch_children  l,
  sys.x$bh  x
where
  x.hladdr = l.addr and
  x.obj < power(2,22) and
  x.hladdr  = '00000005F9EA6780'
) sq, dba_objects o
where
  o.data_object_id=sq.obj
  order by sq.tch desc, file#, dbablk, scn_bas
) where rownum<40;

OBJECT                                                         OBJ      FILE#     DBABLK        TCH STATE    SCN_BAS
------------------------------------------------------- ---------- ---------- ---------- ---------- ----- ----------
JSCHDER.SPOT_ACTIVITY [TABLE]                               903892        214     110918          3 SCUR
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          2 CR     279622172
JSCHDER.BIGTABLE_ORDERS_MF_AND_PARTS [INDEX]                  3136        994     121460          1 SCUR
JSCHDER.BIGTABLE.P_PARTS_APPROVED [TABLE PARTITION]        3064309       1156      33703          1 SCUR
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279619002
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279619202
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279621063
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279618998
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 PI
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 PI
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTI    3208620       1542       4537          1 SCUR
TION]

JSCHDER.PERSISTENT_QUERY [TABLE]                           1455110         10      99668          0 SCUR
JSCHDER.GENERATED_REPORT [TABLE]                           1455107         13      22427          0 SCUR
JSCHDER.SPOT_ACTIVITY_URL_DETAIL [TABLE]                   2684382        287      51469          0 SCUR
JSCHDER.GENERATED_REPORT [TABLE]                           1455107        476     103892          0 SCUR
JSCHDER.BIGTABLE_TOTALS [TABLE]                            1746552        524     187199          0 SCUR
JSCHDER.PERSISTENT_QUERY [TABLE]                           1455110       1476      21915          0 SCUR

17 rows selected.

Now this query has done more than show me the blocks protected by this latch child. The “TCH” field is telling me the Touch Count for the buffer – an indication of how hot the block is. This counter is incremented most of the time that Oracle accesses the block and if I remember right it is decremented automatically every 3 seconds or so. So this will show me – in real time – what blocks are being accessed.

First of all I noticed that there are only 17 blocks on this latch – which should be great for concurrency! These blocks must have been hammered to have so many sleeps! Secondly, I noticed that there’s one block that has seven different versions in this instance alone. I was immediately suspicious about that block and decided to check something…

select header_file, header_block from dba_segments
where owner='JSCHDER' and segment_name='BIGTABLE_LOG' and PARTITION_NAME='P_2007_09';

HEADER_FILE HEADER_BLOCK
----------- ------------
       1209        16393

Looks like that’s the header block!! Interesting… so I wonder how many copies of this block are spread around the cluster? That’s pretty easy to check too.

select inst_id, status, dirty, stale from gv$bh
where file#=1209 and block#=16393 order by 1, 2;

   INST_ID STATUS  D S
---------- ------- - -
         1 cr      N Y
         1 cr      N N
         1 cr      N N
         1 cr      N N
         1 cr      N N
         1 pi      Y N
         2 cr      N N
         2 cr      N N
         2 cr      N Y
         2 cr      N N
         2 pi      Y N
         3 cr      N N
         3 cr      N N
         3 cr      N Y
         3 cr      N N
         3 cr      N N
         3 pi      Y N
         4 cr      N N
         4 cr      N N
         4 cr      N N
         4 cr      N N
         4 cr      N N
         4 pi      Y N
         4 xcur    N N

Quite a few. In fact I discovered that if I repeated that query even a few moments apart the “xcur” block (which is the one being updated) is constantly moving between instances. Normal operation for RAC… but interesting to watch nonetheless!

What Does it All Mean?

So what’s the point? Well we’ve shored up the theory that BIGTABLE_LOG is the main culprit in this system. The header block is where the master freelist is and that would be bounced around a lot without freelist groups. After switching to ASSM or adding freelist groups I would expect to see fewer copies of the header block in the global cache. Hopefully they’ll let me know what happens so I can post an update! (Sounds like they’re planning to move the partition to ASSM during the downtime window this weekend!)

Appendix: Quickly Mapping file#/block# to an Object

I’m sure that everyone, at some point, has had a block number and wanted to know the object. And if you’ve been in this situation with a large database then you realize that there’s just no good way to do it. You have to query the dreaded DBA_EXTENTS view… which can take years to finish.

If you didn’t notice, I actually just demonstrated a potential alternative – maybe. If the block you’re checking is in the buffer cache (on any instance in RAC) then you can just read from v$bh instead of waiting around for DBA_EXTENTS! Just use a query something like this:

col object_name format a20
col owner format a10
col subobject_name format a20
select distinct o.owner, o.object_name, o.subobject_name, o.object_type
from dba_objects o, gv$bh b
where o.data_object_id=b.objd and b.objd < power(2,22) and status != 'free'
  and b.file#=1209
  and b.block#=16393
;

OWNER      OBJECT_NAME          SUBOBJECT_NAME       OBJECT_TYPE
---------- -------------------- -------------------- -------------------
JSCHDER    BIGTABLE_LOG         P_2007_09            TABLE PARTITION

Very handy shortcut in my opinion. If you’re not on RAC then this will go even faster.

Happy hacking!

Update 9/14/07 – updated queries against dba_objects/v$bh to eliminate rollback and temp segs. also fixed last query in the post to use data_object_id.

内容概要:本文设计了一种基于PLC的全自动洗衣机控制系统内容概要:本文设计了一种,采用三菱FX基于PLC的全自动洗衣机控制系统,采用3U-32MT型PLC作为三菱FX3U核心控制器,替代传统继-32MT电器控制方式,提升了型PLC作为系统的稳定性与自动化核心控制器,替代水平。系统具备传统继电器控制方式高/低水,实现洗衣机工作位选择、柔和过程的自动化控制/标准洗衣模式切换。系统具备高、暂停加衣、低水位选择、手动脱水及和柔和、标准两种蜂鸣提示等功能洗衣模式,支持,通过GX Works2软件编写梯形图程序,实现进洗衣过程中暂停添加水、洗涤、排水衣物,并增加了手动脱水功能和、脱水等工序蜂鸣器提示的自动循环控制功能,提升了使用的,并引入MCGS组便捷性与灵活性态软件实现人机交互界面监控。控制系统通过GX。硬件设计包括 Works2软件进行主电路、PLC接梯形图编程线与关键元,完成了启动、进水器件选型,软件、正反转洗涤部分完成I/O分配、排水、脱、逻辑流程规划水等工序的逻辑及各功能模块梯设计,并实现了大形图编程。循环与小循环的嵌; 适合人群:自动化套控制流程。此外、电气工程及相关,还利用MCGS组态软件构建专业本科学生,具备PL了人机交互C基础知识和梯界面,实现对洗衣机形图编程能力的运行状态的监控与操作。整体设计涵盖了初级工程技术人员。硬件选型、; 使用场景及目标:I/O分配、电路接线、程序逻辑设计及组①掌握PLC在态监控等多个方面家电自动化控制中的应用方法;②学习,体现了PLC在工业自动化控制中的高效全自动洗衣机控制系统的性与可靠性。;软硬件设计流程 适合人群:电气;③实践工程、自动化及相关MCGS组态软件与PLC的专业的本科生、初级通信与联调工程技术人员以及从事;④完成PLC控制系统开发毕业设计或工业的学习者;具备控制类项目开发参考一定PLC基础知识。; 阅读和梯形图建议:建议结合三菱编程能力的人员GX Works2仿真更为适宜。; 使用场景及目标:①应用于环境与MCGS组态平台进行程序高校毕业设计或调试与运行验证课程项目,帮助学生掌握PLC控制系统的设计,重点关注I/O分配逻辑、梯形图与实现方法;②为工业自动化领域互锁机制及循环控制结构的设计中类似家电控制系统的开发提供参考方案;③思路,深入理解PL通过实际案例理解C在实际工程项目PLC在电机中的应用全过程。控制、时间循环、互锁保护、手动干预等方面的应用逻辑。; 阅读建议:建议结合三菱GX Works2编程软件和MCGS组态软件同步实践,重点理解梯形图程序中各环节的时序逻辑与互锁机制,关注I/O分配与硬件接线的对应关系,并尝试在仿真环境中调试程序以加深对全自动洗衣机控制流程的理解。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值