前言:从10开始默认优化器就是CBO模式,RBO已经靠边站了,CBO是根据什么来计算成本的,当然就是信息了,这个信息就是所谓统计信息,可见统计信息的重要性;
版本:11.2
先来看张表T1,未做统计前可以看到统计信息都是空的(空时CBO要获取信息采用动态采样),信息是通过脚本SOSI.SQL(群里上载文件中有)显示;
exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true);
统计完后的信息显示
--执行一条语句看下计划,是范围扫描,与实际相符;
SQL> @allstat
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ccm5vanq8y709, child number 0
-------------------------------------
select status from t1 where object_id=88
Plan hash value: 190799060
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 1 |00:00:00.01 | 3 |
|* 2 | INDEX RANGE SCAN | IDX_T1_ID | 1 | 1 | 1 |00:00:00.01 | 2 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=88)
19 rows selected.
--更改数据造成数据倾斜;100条,88的99条,99的值1条;
SQL> update t1 set object_id=88 where rownum<100;
99 rows updated.
SQL> commit;
Commit complete.
SQL> update t1 set object_id=99 where object_id<>88;
1 row updated.
SQL> commit;
Commit complete.
--再次收集必变后的表信息:exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true);
--这个计划是对的,因为88的值的行数占了99%;
select /*1*/status from t1 where object_id=88;
SQL> @allstat
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 4natz4s7h1uda, child number 0
-------------------------------------
select /*1*/status from t1 where object_id=88
Plan hash value: 3617692013
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 99 |00:00:00.01 | 10 |
|* 1 | TABLE ACCESS FULL| T1 | 1 | 99 | 99 |00:00:00.01 | 10 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID"=88)
18 rows selected.
--重新做次收集,目的关闭直方图(for..size 1);
exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true,method_opt=>'for all columns size 1');
--看下执行计划,可以看到计划并非最优,它选择了范围扫描,而本来是全表扫描会略好;
select /*3*/status from t1 where object_id=88;
SQL> @allstat
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID cn417ch2unkp7, child number 0
-------------------------------------
select /*3*/status from t1 where object_id=88
Plan hash value: 190799060
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 99 |00:00:00.01 | 17 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 50 | 99 |00:00:00.01 | 17 |
|* 2 | INDEX RANGE SCAN | IDX_T1_ID | 1 | 50 | 99 |00:00:00.01 | 8 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=88)
19 rows selected.
--看下此时列的直方图情况,可以看到直方图确实都关闭了;
--重新收集列OBJECT_ID上的直方图信息
exec dbms_stats.gather_table_stats('AIKI','T1',cascade=>true,method_opt=>'for columns object_id size auto',estimate_percent=>100);
select /*5*/status from t1 where object_id=88;
--计划已经回归到最优了;
SQL> @allstat
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gts3wuc1sy8dn, child number 0
-------------------------------------
select /*5*/status from t1 where object_id=88
Plan hash value: 3617692013
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 99 |00:00:00.01 | 10 |
|* 1 | TABLE ACCESS FULL| T1 | 1 | 99 | 99 |00:00:00.01 | 10 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID"=88)
18 rows selected.
--直方图是什么 ?直方图就是列的统计信息,看下面的图你就明白了,88的值有99个,99的值有100-99个,100是累加值;
select * from dba_tab_histograms where owner='AIKI' and table_name='T1'
--直方图分为:等频和等高;怎么区别,很简单,列的唯一值>桶数:等高(HEIGHT BALANCED);等频:唯一值<=桶数,如下图所示,(其中11G之前的最大桶数不能超过254,软件局限),12C可超过;
select owner,table_name,column_name, histogram from dba_tab_col_statistics
where owner='AIKI'
and table_name='T1'
分区表的统计收集命令,操练下就明白了:
exec dbms_stats.gather_table_stats('AIKI','PT',partname=>'T2',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',granularity=>'ALL'); --全局表+分区表的信息都做统计
exec dbms_stats.gather_table_stats('AIKI','PT',partname=>'T2',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1',granularity=>'PARTITION'); --只分区表做统计
统计信息与应用关系:
1.通过上面测试可以看到,当数据出现倾斜时,则直方图就有存在的意义,否则就无意义了;并且此列在做收集前必须存在于运行SQL的WHERE条件后,这个很好理解(都没用,做直方图也无意义了);
2.比如这条语句:select status from t1 where object_id=66 and status='VALID';当出现执行慢时你会怎么思考?
很简单 :先SOSI下看下列object_id,status的列的唯一值数(选择率),表T1的统计信息,这样心中便可大致确定跟此列相关的索引情况,看下统计时间,是否分区表,索引分布,数据采样比例等;