浅谈索引系列之聚簇因子(clustering_factor)

初次听说聚簇因子,我相信大部分人都是丈二和尚摸不着头脑,不知所云。然而当我们了解其真正含义后,也许会觉得这个高大上的名词也没有那么神秘。废话少说,上官方文档:
Index Clustering Factor
    The index clustering factor measures row order in relation to an indexed value such as employee last name. The more order that exists in row storage for this value, the lower the clustering factor.

The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index

  • If the clustering factor is high, then Oracle Database performs a relatively high number of I/Os during a large index range scan. The index entries point to random table blocks, so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.
  • If the clustering factor is low, then Oracle Database performs a relatively low number of I/Os during a large index range scan. The index keys in a range tend to point to the same data block, so the database does not have to read and reread the same blocks over and over.
     简单来讲聚簇因子用来反映索引键值字段存放杂乱排序程度的一个度量。那么聚簇因子是怎么计算得到的呢?Oracle在计算其值的时候,会对每个索引键值查找对应表的数据,在查找过程中,会跟踪数据块间跳转的次数(当然,数据库不可能真的这么做,源代码只是简单的扫描索引,得到rowid,然后根据rowid就可以获得数据块的地址了),每一次跳转,计数器都会增加,计算完全部的索引键值得到的结果就是聚簇因子。因此表的某个字段越有序,若在此字段创建索引,索引对应的聚簇因子就越小。
    下面通过实验作一个直观的展现:

点击(此处)折叠或打开

  1. drop table colocated;
  2. create table colocated ( x int, y varchar2(2000));
  3. begin
  4.   for i in 1..100000 loop
  5.      insert into colocated values(i,dbms_random.string('a',5));
  6.   end loop;
  7.   commit;
  8. end;
  9. /
  10. alter table colocated add constraint colocated_pk primary key(x);

  11. drop table disorganized;
  12. create table disorganized as select x,y from colocated order by y;
  13. alter table disorganized add constraint disorganized_pk primary key(x);
  14. exec dbms_stats.gather_table_stats(USER,'COLOCATED',CASCADE=>TRUE);
  15. exec dbms_stats.gather_table_stats(USER,'DISORGANIZED',CASCADE=>TRUE);
  16. SELECT a.index_name,b.num_rows,b.blocks,a.clustering_factor FROM user_indexes a,user_tables b WHERE a.table_name=b.table_name AND a.table_name='COLOCATED';
  17. INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
  18. ------------------------------ ---------- ---------- -----------------
  19. COLOCATED_PK 100000 244 219

  20. SELECT a.index_name,b.num_rows,b.blocks,a.clustering_factor FROM user_indexes a,user_tables b WHERE a.table_name=b.table_name AND a.table_name='DISORGANIZED';

  21. INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
  22. ------------------------------ ---------- ---------- -----------------
  23. DISORGANIZED_PK 100000 232 99572
        ID 按照有序依次插入到 COLOCATED 表中,而 DISORGANIZED 表中的 ID 列则是随机分布的。 因此索引 COLOCATED_P K 的聚簇因子较小,基本等于表的 BLOCKS 数,索引 DISORGANIZED_PK 的聚簇因子 较大,基本等于表 NUM_ROWS
       了解了聚簇因子的概念,具体聚簇因子有什么用途呢?下面进一步说明。
     

点击(此处)折叠或打开

  1. set autotrace traceonly;
  2. SQL> select * from COLOCATED where x between 20000 and 20050;
  3. 51 rows selected.

  4. Execution Plan
  5. ----------------------------------------------------------
  6. Plan hash value: 1550765370

  7. --------------------------------------------------------------------------------------------
  8. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  9. --------------------------------------------------------------------------------------------
  10. | 0 | SELECT STATEMENT | | 52 | 572 | 3 (0)| 00:00:01 |
  11. | 1 | TABLE ACCESS BY INDEX ROWID| COLOCATED | 52 | 572 | 3 (0)| 00:00:01 |
  12. |* 2 | INDEX RANGE SCAN | COLOCATED_PK | 52 | | 2 (0)| 00:00:01 |
  13. --------------------------------------------------------------------------------------------

  14. Predicate Information (identified by operation id):
  15. ---------------------------------------------------

  16.    2 - access("X">=20000 AND "X"<=20050)


  17. Statistics
  18. ----------------------------------------------------------
  19.           0 recursive calls
  20.           0 db block gets
  21.          11 consistent gets
  22.           0 physical reads
  23.           0 redo size
  24.        2000 bytes sent via SQL*Net to client
  25.         556 bytes received via SQL*Net from client
  26.           5 SQL*Net roundtrips to/from client
  27.           0 sorts (memory)
  28.           0 sorts (disk)
  29.          51 rows processed

  30. SQL> select * from DISORGANIZED where x between 20000 and 20050;
  31. 51 rows selected.
  32. Execution Plan
  33. ----------------------------------------------------------
  34. Plan hash value: 2594580634

  35. -----------------------------------------------------------------------------------------------
  36. | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
  37. -----------------------------------------------------------------------------------------------
  38. | 0 | SELECT STATEMENT | | 52 | 572 | 54 (0)| 00:00:01 |
  39. | 1 | TABLE ACCESS BY INDEX ROWID| DISORGANIZED | 52 | 572 | 54 (0)| 00:00:01 |
  40. |* 2 | INDEX RANGE SCAN | DISORGANIZED_PK | 52 | | 2 (0)| 00:00:01 |
  41. -----------------------------------------------------------------------------------------------

  42. Predicate Information (identified by operation id):
  43. ---------------------------------------------------

  44.    2 - access("X">=20000 AND "X"<=20050)


  45. Statistics
  46. ----------------------------------------------------------
  47.           1 recursive calls
  48.           0 db block gets
  49.          56 consistent gets
  50.           0 physical reads
  51.           0 redo size
  52.        2000 bytes sent via SQL*Net to client
  53.         556 bytes received via SQL*Net from client
  54.           5 SQL*Net roundtrips to/from client
  55.           0 sorts (memory)
  56.           0 sorts (disk)
  57.          51 rows processed

       COLOCATEDDISORGANIZED两张表存储了同样的语句,相同的 SQL语句执行,聚簇因子小的表逻辑读为 11,聚簇因子大的逻辑读却为 56,大家都知道逻辑读和锁机制息息相关,锁会影响到了数据库的并发性,也会影响性能。
     小结:
   1.聚簇因子的高低会影响执行计划的选择。
  2.聚簇因子与表的存储有关,无法通过重建索引来改变聚簇因子的大小。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29827284/viewspace-2134262/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29827284/viewspace-2134262/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值