有关位图连接索引(bitmap join index)的一些测试

本文探讨了在数据库中创建和使用位图连接索引的过程,并通过实例展示了位图连接索引的创建、使用及性能影响。在测试过程中,发现优化器在某些情况下不会选择使用位图连接索引,尽管它能降低逻辑IO。文章还讨论了如何通过调整统计信息和强制使用索引来影响执行计划,以及位图连接索引在特定场景下可能无法避免排序操作的问题。最后,文章指出位图连接索引在OLTP系统中可能存在的锁扩展会对性能造成影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

数据库版本:10.2.0.4

create table zsj_obj as select owner,object_id from dba_objects where object_id is not null;

create table zsj_obj_detail pctfree 90
as select object_id,object_name,object_type,last_ddl_time+rownum last_ddl_time,status
from dba_objects where object_id is not null;

alter table zsj_obj_detail add constraint pk_zsj_obj_detail primary key(object_id);

create bitmap index ind_zsj_obj_detail
on zsj_obj_detail(o.owner)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

指向zsj_obj_detail的数据行.where子句中必须连接另一个表zsj_obj的主键或者是唯一键.如果zsj_obj是复合主键的话,where子句中必须包含zsj_obj的所有主键列.

create bitmap index ind_zsj_obj_detail
on zsj_obj_detail(o.owner)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id
ORA-25954: missing primary key or unique constraint on dimension
这里报错了,因为zsj_obj的object_id不是主键也不是唯一键
,虽然说zsj_obj_detail的object_id是主键吧,但那不是我这里要求的.

alter table zsj_obj_detail drop constraint pk_zsj_obj_detail;  我把zsj_obj_detail上的主键给去掉了
alter table zsj_obj add constraint pk_zsj_obj primary key(object_id);  加上了zsj_obj的主键约束
SQL> desc zsj_obj;
Name      Type         Nullable Default Comments
--------- ------------ -------- ------- --------
OWNER     VARCHAR2(30) Y                        
OBJECT_ID NUMBER                                 
SQL> desc zsj_obj_detail;
Name          Type          Nullable Default Comments
------------- ------------- -------- ------- --------
OBJECT_ID     NUMBER        Y                        
OBJECT_NAME   VARCHAR2(128) Y                        
OBJECT_TYPE   VARCHAR2(19)  Y                        
LAST_DDL_TIME DATE          Y                        
STATUS        VARCHAR2(7)   Y      

这样就可以创建上面的位图连接索引了:
create bitmap index ind_zsj_obj_detail
on zsj_obj_detail(o.owner)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

Index created
索引创建成功了.

SQL> SELECT INDEX_TYPE,join_index,TABLE_NAME FROM USER_INDEXES WHERE INDEX_NAME=UPPER('ind_zsj_obj_detail');
INDEX_TYPE                  JOIN_INDEX TABLE_NAME
--------------------------- ---------- ------------------------------
BITMAP                      YES        ZSJ_OBJ_DETAIL
这里显示是zsj_obj_detail表上的索引.

exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ',cascade=>true,estimate_percent=>100,method_opt=>'for columnS OWNER SIZE 254,OBJECT_ID SIZE 1');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',cascade=>true,estimate_percent=>100,method_opt=>'for all columnS size 1');
分析表zsj_obj_detail连带它的索引也分析的时候,ind_zsj_obj_detail也被分析了,可以知道这个索引确实是zsj_obj_detail这个表的索引.

SQL> SELECT OWNER,COUNT(1) CNT FROM ZSJ_OBJ GROUP BY OWNER ORDER BY CNT DESC;
OWNER                                 CNT
------------------------------ ----------
SYS                                 23328
......
TSMSYS                                  3
 
21 rows selected
 
SQL> select TABLE_NAME,COLUMN_NAME,NUM_BUCKETS,HISTOGRAM from User_Tab_Columns where table_name in('ZSJ_OBJ','ZSJ_OBJ_DETAIL');
TABLE_NAME                     COLUMN_NAME                    NUM_BUCKETS HISTOGRAM
------------------------------ ------------------------------ ----------- ---------------
ZSJ_OBJ                        OWNER                                   21 FREQUENCY
ZSJ_OBJ                        OBJECT_ID                                1 NONE
ZSJ_OBJ_DETAIL                 OBJECT_ID                                1 NONE
ZSJ_OBJ_DETAIL                 OBJECT_NAME                              1 NONE
ZSJ_OBJ_DETAIL                 OBJECT_TYPE                              1 NONE
ZSJ_OBJ_DETAIL                 LAST_DDL_TIME                            1 NONE
ZSJ_OBJ_DETAIL                 STATUS                                   1 NONE
 
7 rows selected

select COUNT(1)
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS';
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |     5 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE             |                    |     1 |     5 |            |          |
|   2 |   BITMAP CONVERSION COUNT   |                    |  2509 | 12545 |     2   (0)| 00:00:01 |
|*  3 |    BITMAP INDEX SINGLE VALUE| IND_ZSJ_OBJ_DETAIL |       |       |            |          |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"."SYS_NC00006$"='SYS')
Statistics
----------------------------------------------------------
          0  db block gets
          4  consistent gets
          1  rows processed
只使用索引,不需要访问表数据.

select B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
-------------------------------------------------------------------------------------
| Id  | Operation          | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                |     3 |   183 |   710   (1)| 00:00:09 |
|*  1 |  HASH JOIN         |                |     3 |   183 |   710   (1)| 00:00:09 |
|*  2 |   TABLE ACCESS FULL| ZSJ_OBJ        |     3 |    30 |    23   (5)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| ZSJ_OBJ_DETAIL | 52691 |  2624K|   687   (1)| 00:00:09 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
   2 - filter("A"."OWNER"='TSMSYS')
Statistics
----------------------------------------------------------
          0  db block gets
       4000  consistent gets
          3  rows processed
默认是不使用这个索引的,为什么呢?强制使用这个索引看看:
select /*+ INDEX(B ind_zsj_obj_detail)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |  2509 |   124K|  1524   (1)| 00:00:19 |
|   1 |  TABLE ACCESS BY INDEX ROWID | ZSJ_OBJ_DETAIL     |  2509 |   124K|  1524   (1)| 00:00:19 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                    |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | IND_ZSJ_OBJ_DETAIL |       |       |            |          |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"."SYS_NC00006$"='TSMSYS')
Statistics
----------------------------------------------------------
          0  db block gets
          5  consistent gets
          3  rows processed
明显,这里的逻辑io要低得多(当然,cpu time时间之类的统计信息这里看不到),但为什么默认情况下,优化器没有使用这个执行计划呢?可以看到这里的cost是1524,比710要大,所以被优化器给抛弃了.但为什么这个代价这个高呢?和可能和这里估算出来的card=2509有关.但我在zsj_obj.owner上是收集了柱状图统计信息的呀,所以可以看到在默认的执行计划中估算最终的card=3是完全准确的呀,可这里为什么估算出了2509呢,这个数字是如何得来的呢?
SQL> select table_name,round(num_rows/21,2) from user_tables where table_name in('ZSJ_OBJ','ZSJ_OBJ_DETAIL');
TABLE_NAME                     ROUND(NUM_ROWS/21,2)
------------------------------ --------------------
ZSJ_OBJ_DETAIL                               2509.1
ZSJ_OBJ                                     2509.05
这里正好是2509,也就是说又认为这21个值是平均分布的了.
可以想象,这里的位图连接索引也会像FBI一样,对应于zsj_obj_detail上的一个虚拟列.
SQL> select column_name,num_distinct,num_buckets,histogram,virtual_column,hidden_column from user_tab_cols where table_name='ZSJ_OBJ_DETAIL';
COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       VIRTUAL_COLUMN HIDDEN_COLUMN
------------------------------ ------------ ----------- --------------- -------------- -------------
OBJECT_ID                             52691           1 NONE            NO             NO
OBJECT_NAME                           32241           1 NONE            NO             NO
OBJECT_TYPE                              41           1 NONE            NO             NO
LAST_DDL_TIME                         52691           1 NONE            NO             NO
STATUS                                    2           1 NONE            NO             NO
SYS_NC00006$                                            NONE            YES            YES
这里的SYS_NC00006$正是上面的执行计划中出现的access filter: 3 - access("B"."SYS_NC00006$"='TSMSYS')
可以看到这个列上连基本的优化器统计信息都没有收集.我们收集一下它上的柱状图统计信息:
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',estimate_percent=>100,method_opt=>'for columns SYS_NC00006$ size 254');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',estimate_percent=>100,method_opt=>'for columns "SYS_NC00006$" size 254');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',estimate_percent=>100,method_opt=>'for all hidden columns size 254');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',cascade=>true,estimate_percent=>100,method_opt=>'for all columns size 254');
可使用上面的任何一个选项,都不收集SYS_NC00006$这个虚拟列上的优化器统计信息,上面2个sql的执行计划,card也都没有任何的改变.
那就试试cardinality这个提示:
select /*+ cardinality(a 3)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
不管用.还是原来默认的执行计划.

select /*+ cardinality(b 3)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
也同样不是我要的结果.
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |     3 |   183 |   688   (1)| 00:00:09 |
|   1 |  NESTED LOOPS                |                |     3 |   183 |   688   (1)| 00:00:09 |
|   2 |   TABLE ACCESS FULL          | ZSJ_OBJ_DETAIL |     3 |   153 |   686   (1)| 00:00:09 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| ZSJ_OBJ        |     1 |    10 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | PK_ZSJ_OBJ     |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("A"."OWNER"='TSMSYS')
   4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Statistics
----------------------------------------------------------
     109267  consistent gets
          3  rows processed

伪造一下数据,让优化器认为这个虚拟里上的不同值很多,并且它原来就没有收集柱状图统计信息,按说优化器会认为a.owner='TSMSYS'返回很少的数据行了呀.这里就是10(52691/5000)行多点儿而已.
exec dbms_stats.set_column_stats(user,'ZSJ_OBJ_DETAIL','SYS_NC00006$',distcnt => 5000,density => 1/5000,nullcnt => 0);
SQL> select column_name,num_distinct,num_buckets,histogram,virtual_column,hidden_column,c.last_analyzed from user_tab_cols c where table_name='ZSJ_OBJ_DETAIL';
COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM       VIRTUAL_COLUMN HIDDEN_COLUMN LAST_ANALYZED
------------------------------ ------------ ----------- --------------- -------------- ------------- -------------
SYS_NC00006$                           5000           1 NONE            YES            YES           2010-12-21 20

但根本就不管用:
select B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
还是使用原来的执行计划.

select /*+ INDEX(B ind_zsj_obj_detail)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
card,cost根本就没有改变.
这个先到这里吧,继续其它的实验

*************************
SQL> exec dbms_stats.delete_table_stats(user,'ZSJ_OBJ_DETAIL',cascade_columns => TRUE,cascade_indexes => TRUE);
SQL> exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',cascade=>true,estimate_percent=>100,method_opt=>'for all columnS size 1');
继续实验吧!
create bitmap index ind2_zsj_obj_detail
on zsj_obj_detail(o.owner,d.object_type)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

Index created
新建一个索引.
select COUNT(1)
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' and b.object_type='INDEX';

| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |     1 |    14 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |                     |     1 |    14 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID | ZSJ_OBJ_DETAIL      |   235 |  3290 |     1   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS|                     |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE | IND2_ZSJ_OBJ_DETAIL |       |       |            |          |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("B"."OBJECT_TYPE"='INDEX')
   4 - access("B"."SYS_NC00007$"='SYS' AND "B"."SYS_NC00008$"='INDEX')
这个执行计划为什么还需要回访表呢?

create bitmap index ind3_zsj_obj_detail
on zsj_obj_detail(o.owner,d.last_ddl_time desc)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

Index created
是可以创建成功的

select * from
(
select b.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' order by b.last_ddl_time desc
)
where rownum<=10;
默认,是不使用这个索引的

select * from
(
select /*+ index(b ind3_zsj_obj_detail)*/b.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' order by b.last_ddl_time desc
)
where rownum<=10;
强制使用这个索引,但还是有排序操作.
--------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                     |    10 |  1040 | 12874   (1)| 00:02:35 |
|*  1 |  COUNT STOPKEY                   |                     |       |       |            |          |
|   2 |   VIEW                           |                     | 10328 |  1048K| 12874   (1)| 00:02:35 |
|*  3 |    SORT ORDER BY STOPKEY         |                     | 10328 |   615K| 12874   (1)| 00:02:35 |
|*  4 |     HASH JOIN                    |                     | 10328 |   615K| 12872   (1)| 00:02:35 |
|*  5 |      TABLE ACCESS FULL           | ZSJ_OBJ             | 23328 |   227K|    23   (5)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID | ZSJ_OBJ_DETAIL      | 23328 |  1161K| 12849   (1)| 00:02:35 |
|   7 |       BITMAP CONVERSION TO ROWIDS|                     |       |       |            |          |
|*  8 |        BITMAP INDEX RANGE SCAN   | IND3_ZSJ_OBJ_DETAIL |       |       |            |          |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
   5 - filter("A"."OWNER"='SYS')
   8 - access("B"."SYS_NC00009$"='SYS')
       filter("B"."SYS_NC00009$"='SYS')
Statistics
----------------------------------------------------------
       3364  consistent gets
         10  rows processed

SQL> alter table zsj_obj_detail modify(last_ddl_time not null);
last_ddl_time列上加上非空的约束条件.

select * from
(
select /*+ index(b ind3_zsj_obj_detail)*/b.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' order by b.last_ddl_time desc
)
where rownum<=10;
仍然需要排序.而无法像普通索引那样避免排序操作.
当然,这里一系列的疑问,包括索引里的字段都包括了,为什么还需要回访表;这里为什么不能避免排序操作,应该来说都和位图连接索引的内部结构和实现相关,毕竟它不是常规的平衡树索引,无法实现常规的平衡树索引的功能.

我这里做这些实验,其实是因为我想通过位图连接索引实现一些sql性能上的优化(假如它具有平衡树索引的一些特性的话):
有一个产品表product,里面有一些通用的产品信息,包括创建时间,pageview等非空字段,经常需要某个小类或者是品牌的top10新品,top10热品信息,如果这个表有catalog_id,brand_id字段记录这个产品所属的小类,品牌的话,创建类似index(catalog_id,pageview desc)这样的索引,那么实现某个小类或者是品牌的top10新品,top10热品信息的检索完全可以使用这些索引来高效的完成,根本就不需要排序的(它的资源消耗几乎是确定的,只和索引的blevel有关,和这个小类,品牌下的产品数量多少没有关系).但因为一些历史的原因:原来一个产品是可以属于多个小类的,原来一个产品要在多个网站上展现,而这多个网站下相同的小类,品牌的id是不相同的,导致catalog_id,brand_id不能作为product的一个属性记录在product表中了,所以新加了一个表catalogrelateproduct记录这种对应关系(catalogid,productid),其中catalogid可以是小类id,也可以是品牌id.
SELECT *
          FROM (SELECT P.*
                  FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP
                 WHERE CATAP.CATALOGID = 122
                   AND P.ID = CATAP.PRODUCTID
                   AND P.PAGEVIEW IS NOT NULL
                 ORDER BY P.PAGEVIEW DESC)
         WHERE ROWNUM <= 10;
对于这样的语句来说,性能上的损失是不可避免的了.其实我是想尝试一下,看看位图连接索引是否可以有所帮助,能够像普通的平衡树索引一样建立index(CATAP.CATALOGID,P.PAGEVIEW desc) 而快速的找到前10行数据.这样的话,现在的表结构不用修改,业务代码不用修改,sql也不需要修改.看来是不可能了.
因为后来确定一个产品只能属于一个小类了,并且现在只有一个站点在使用这批数据了,所以是可以使用product上的字段来记录它所属的小类,品牌了.是可以高效实现这样的功能,不需要排序的.但修改这批相关的sql而提高性能看来是不可避免的了.

当然了,即使确实能够如我所想的那样实现,在OLTP这样频繁的进行dml操作的数据库系统上,它的使用也是一个问题:
普通的位图索引,在修改一行数据的时候,锁定的并不只是修改的这行数据,而是相关的一组数据行,也就是说存在着锁扩容的问题,但至少在表上放置的还是TM3的锁.位图连接索引就不是这样了:

create index ind4_zsj_obj_detail on zsj_obj_detail(object_id);

523> insert into zsj_obj(Owner,OBJECT_ID) VALUES('ZHAOSJ',100000);
这需要先在zsj_obj上放置tm3的锁,然后在zsj_obj_detail上放置tm4的锁.
524> INSERT INTO ZSJ_OBJ_DETAIL(OBJECT_ID,OBJECT_NAME,OBJECT_TYPE,LAST_DDL_TIME,STATUS) VALUES(100002,'TEST1','TABLE',SYSDATE,'VALID');
这需要先在zsj_obj上先放置一个tm4的锁,然后在zsj_obj_detail上放置一个tm3的锁.
若在同一个事务中同时操作这两个表,最终会在zsj_obj,zsj_obj_detail上都持有tm5的锁.

都需要tm4,甚至tm5的表级锁了,显然这是OLTP系统所不能承受的.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值