Oracle默认情况下认为表中列的数据时平均分布的,但实际上并不是如此,下面做个简单的简单的实验来看看Freqency直方图对可选择率的影响。Freqency类型的直方图时指收集表中列的统计信息时指定的Bucket大于等于表中列的DISTINCT值,否则收集到的是BANLACED 类型的直方图。
1、环境准备
**创建实验使用的表T1**
SCOTT@TNS_PDB01>create table t1(id number);
Table created.
SCOTT@TNS_PDB01>desc t1;
Name Null? Type
----------------------------------------------------------- -------- -----------------------------------------
ID NUMBER
declare
i number;
begin
for i in 1 .. 3296 loop
insert into t1 values(1);
end loop;
for i in 1 .. 100 loop
insert into t1 values(3);
end loop;
for i in 1 .. 798 loop
insert into t1 values(5);
end loop;
for i in 1 .. 3970 loop
insert into t1 values(7);
end loop;
for i in 1 .. 16293 loop
insert into t1 values(10);
end loop;
for i in 1 .. 3399 loop
insert into t1 values(16);
end loop;
for i in 1 .. 3651 loop
insert into t1 values(27);
end loop;
for i in 1 .. 3892 loop
insert into t1 values(32);
end loop;
for i in 1 .. 3521 loop
insert into t1 values(39);
end loop;
for i in 1 .. 1080 loop
insert into t1 values(49);
end loop;
commit;
exception
when others then
dbms_output.put_line(sqlerrm);
end;
/
**可以看出表中数据不是均匀分布的**
SCOTT@TNS_PDB01>select id,count(id) from t1 group by id order by id;
ID COUNT(ID)
--- ----------
1 3296
3 100
5 798
7 3970
10 16293
16 3399
27 3651
32 3892
39 3521
49 1080
10 rows selected.
2、不收集Freqency直方图信息的情况下收集统计信息
```sql
SCOTT@TNS_PDB01>exec dbms_stats.gather_table_stats(ownname => 'SCOTT',tabname => 'T1',estimate_percent => 100,method_opt => 'FOR COLUMNS ID SIZE 1',no_invalidate => false);
PL/SQL procedure successfully completed.
SCOTT@TNS_PDB01>select table_name,column_name,num_distinct,num_nulls,density,histogram from dba_tab_col_statistics where table_name = 'T1' and owner = 'SCOTT';
TABLE_NAME COLUMN_NAM NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
--------------- ---------- ------------ ---------- ---------- ---------------
T1 ID 10 0 .1 NONE
SCOTT@TNS_PDB01>select count(*) from t1 where id = 10;
COUNT(*)
----------
16293
SCOTT@TNS_PDB01>select * from table(dbms_xplan.display_cursor(sql_id => null,cursor_child_no => null,format => 'advanced'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
SQL_ID 5gz7rn8s8kfyy, child number 0
-------------------------------------
select count(*) from t1 where id = 10
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 19 (100)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 4000 | 12000 | 19 (0)| 00:00:01 |
---------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / T1@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "T1"@"SEL$1")
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ID"=10)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
2 - (rowset=1019)
Query Block Registry:
---------------------
<q o="2" f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![
CDATA[SEL$1]]></s></h></f></q>
52 rows selected.
SCOTT@TNS_PDB01>select num_rows from dba_tab_statistics where table_name = 'T1' and owner = 'SCOTT';
NUM_ROWS
----------
40000
SCOTT@TNS_PDB01>select num_distinct,num_nulls from dba_tab_col_statistics where table_name = 'T1' and owner = 'SCOTT';
NUM_DISTINCT NUM_NULLS
------------ ----------
10 0
SCOTT@TNS_PDB01>select 1/10*40000 from dual;
1/10*40000
----------
4000
**可以看出执行计划中的Cardinality和实际的值不相等,是不准确的。**
3、收集表T1列ID的直方图信息
SCOTT@TNS_PDB01>exec dbms_stats.gather_table_stats(ownname => 'SCOTT',tabname => 'T1',estimate_percent => 100,method_opt => 'FOR COLUMNS ID SIZE AUTO',cascade => true,no_invalidate => false);
PL/SQL procedure successfully completed.
SCOTT@TNS_PDB01>select table_name,column_name,num_distinct,num_nulls,histogram from dba_tab_col_statistics where table_name = 'T1' and owner = 'SCOTT';
TABLE_NAME COLUMN_NAM NUM_DISTINCT NUM_NULLS HISTOGRAM
--------------- ---------- ------------ ---------- ---------------
T1 ID 10 0 FREQUENCY
SCOTT@TNS_PDB01>select count(*) from t1 where id = 10;
COUNT(*)
----------
16293
SCOTT@TNS_PDB01>select * from table(dbms_xplan.display_cursor(sql_id => null,cursor_child_no => null,format => 'advanced'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
SQL_ID 5gz7rn8s8kfyy, child number 1
-------------------------------------
select count(*) from t1 where id = 10
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 19 (100)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 16293 | 48879 | 19 (0)| 00:00:01 |
---------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / T1@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "T1"@"SEL$1")
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("ID"=10)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
2 - (rowset=1019)
Query Block Registry:
---------------------
<q o="2" f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[T1]]></t><s><![
CDATA[SEL$1]]></s></h></f></q>
52 rows selected.
SCOTT@TNS_PDB01>select table_name,column_name,endpoint_number,endpoint_value from dba_tab_histograms where table_name = 'T1' and owner = 'SCOTT';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- ---------- --------------- --------------
T1 ID 3296 1
T1 ID 3396 3
T1 ID 4194 5
T1 ID 8164 7
T1 ID 24457 10
T1 ID 27856 16
T1 ID 31507 27
T1 ID 35399 32
T1 ID 38920 39
T1 ID 40000 49
10 rows selected.
**表DBA_TAB_HISTOGRAM中ENDPOINT_VALUE是表T1中DISTINCT(ID)的值,而ENDPINT_NUMBER是ENDPOINT_VALUE数量的累加值**
SCOTT@TNS_PDB01>select id,count(*) from t1 group by id order by id;
ID COUNT(*)
--- ----------
1 3296
3 100
5 798
7 3970
10 16293
16 3399
27 3651
32 3892
39 3521
49 1080
10 rows selected.
**有直方图的情况下可选择率的计算 (我们这里谓词条件 = ENDPOITN_VALUE)**
CARDINALITY=NUM_ROWS*SELECTIVITY
SELECTIVITY=BUCKETSIZE/NUM_ROWS
BUCKETSIZE=CURRENT_ENDPOINT_NUMBER – PREVIOUS_ENDPOINT_NUMBER
SCOTT@TNS_PDB01>select (24457-8164)/40000 from dual;
(24457-8164)/40000
------------------
.407325
SCOTT@TNS_PDB01>select 40000*.407325 from dual;
40000*.407325
-------------
16293
总结:可以看出实际计算出来的值和查询出来值以及执行计划中的Cardinality相同,在表中列数据分布不均匀的情况下一定要收集直方图信息来保证优化器走出正确的执行计划。
本文通过实验展示了在Oracle数据库中,当表列数据分布不均匀时,收集直方图信息对于提高查询优化器的可选择率及生成更精确执行计划的重要性。
583

被折叠的 条评论
为什么被折叠?



