Oracle三组难缠的hint no_unnest/unnest,push_subq,push_pred

本文深入解析Oracle SQL中的三个复杂Hint:no_unnest/unnest、push_subq与push_pred的功能及应用场景,帮助读者理解如何利用这些Hint优化查询性能。

Oracle三组难缠的hint no_unnest/unnest,push_subq,push_pred
2012-02-24


 </body>

友情使欢欣倍增,使疾苦减半。经常有人把这三个hint搞混,主如果因为对三种重写道理不清楚。特总结如下。(实验景象为10204)
1. no_unnest, unnest
unnest我们称为对子查询展开,顾名思义,就是别让子查询孤单地嵌套(nest)在里面。
所以un_unnest双重否定代表必然,即让子查询不展开,让它嵌套(nest)在里面。
现做一个简单的实验:
create table hao1 as * dba_objects;
create table hao2 as * dba_objects;

analyze table hao1 compute statistics;
analyze table hao2 compute statistics;
SQL> hao1.object_id hao1 where exists
2 ( 1 hao2 where hao1.object_id=hao2.object_id*10);

1038 rows ed.

ution Plan
----------------------------------------------------------
Plan hash value: 2662903432

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 80 (3)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 1 | 8 | 80 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL| HAO1 | 10662 | 42648 | 40 (3)| 00:00:01 |
| 3 | TABLE ACCESS FULL| HAO2 | 10663 | 42652 | 40 (3)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("HAO1"."OBJECT_ID"="HAO2"."OBJECT_ID"*10)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
352 consistent gets
0 physical reads
0 redo size
18715 bytes sent via SQL*Net to client
1251 bytes received via SQL*Net client
71 SQL*Net roundtrips to/ client
0 sorts (memory)
0 sorts (disk)
1038 rows processed


这里子查询主动展开(unnest),即HAO2和HAO1 hash join在一路。
接下来若是我们不HAO2展开,想先让它零丁的履行完,然后再来和外部查询进行一种叫做FILTER的操纵。
那么我们参加hint no_unnest:
SQL> hao1.object_id hao1 where exists
2 ( /*+no_unnest*/ 1 hao2 where hao1.object_id=hao2.object_id*10);

1038 rows ed.

ution Plan
----------------------------------------------------------
Plan hash value: 2565749733

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 10750 (1)| 00:01:48 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| HAO1 | 10662 | 42648 | 40 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| HAO2 | 1 | 4 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "HAO2" "HAO2"
WHERE "HAO2"."OBJECT_ID"*10=:B1))
3 - filter("HAO2"."OBJECT_ID"*10=:B1)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1369157 consistent gets
0 physical reads
0 redo size
18715 bytes sent via SQL*Net to client
1251 bytes received via SQL*Net client
71 SQL*Net roundtrips to/ client
0 sorts (memory)
0 sorts (disk)
1038 rows processed

这里HAO1和HAO2进行了一种FILTER操纵,这个操纵在《Cost Based Oracle Fundamental》此书第九章有介绍。他其实很像我们熟悉的neested loop,但它的独特之处在于会保护一个hash table。
举例,若是HAO1里取出object_id=1,那么对于HAO2来说即 1 hao2 where hao2.object_id*10=1,若是前提满足,那么对于子查询,输入输出对,即为(1(HAO1.object_id),1(常量))。
他存储在hash table里,并且因为前提满足,HAO1.object_id=1被放入成果集。
然后接着从HAO1取出object_id=2,若是子查询依旧前提满足,那么子查询产生另一个输入和输出,即(2,1),被放入hash table里;并且HAO1.object_id=2被放入成果集。
接着假设HAO1里有反复的object_id,例如我们第三次从HAO1取出的object_id=2,那么因为我们对于子查询来说,已经有输入输出对(2,1)在hash table里了,所以就不消去再次全表扫描HAO2了,ORACLE很是聪慧地知道object_id=2是成果集。这里,filter和neested loop比拟,省去了一次全表扫描HAO2。
这个hash table是有大小限制的,当被占满的时辰,后续新的HAO1.object_id的FILTER就类似neested loop了。
由此可见,从buffer gets层面上来看,FILTER是应当优于neested loop的,尤其当外部查询须要传递给子查询的输入(此例中为HAO1.object_id)的distinct value很是小时,FILTER就会显得更优。
即使在我这个例子中,HAO1.object_id的distinct value上万,我对比了一下neested loop,FILTER仍然略优:
SQL> /*+use_nl(hao1 hao2)*/ hao1.object_id hao1,hao2 where hao1.object_id=hao2.object_id*10;

1038 rows ed.

ution Plan
----------------------------------------------------------
Plan hash value: 251947914

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10663 | 85304 | 404K (2)| 01:07:23 |
| 1 | NESTED LOOPS | | 10663 | 85304 | 404K (2)| 01:07:23 |
| 2 | TABLE ACCESS FULL| HAO1 | 10662 | 42648 | 40 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| HAO2 | 1 | 4 | 38 (3)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("HAO1"."OBJECT_ID"="HAO2"."OBJECT_ID"*10)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1503621 consistent gets
0 physical reads
0 redo size
18715 bytes sent via SQL*Net to client
1251 bytes received via SQL*Net client
71 SQL*Net roundtrips to/ client
0 sorts (memory)
0 sorts (disk)
1038 rows processed

FILTER的consistent gets是1369157,neested loop的consistent gets是1503621。
若是我们验证我前面的结论,我们可以用distinct value较小的object_type来做个类似的对比实验。
SQL> hao1.object_id hao1 where exists
2 ( /*+no_unnest*/ 1 hao2 where hao1.object_type=hao2.object_type);

10662 rows ed.

ution Plan
----------------------------------------------------------
Plan hash value: 2565749733

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 288 | 3168 | 114 (1)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| HAO1 | 10662 | 114K| 40 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| HAO2 | 2 | 14 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "HAO2" "HAO2"
WHERE "HAO2"."OBJECT_TYPE"=:B1))
3 - filter("HAO2"."OBJECT_TYPE"=:B1)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
17012 consistent gets
0 physical reads
0 redo size
187491 bytes sent via SQL*Net to client
8302 bytes received via SQL*Net client
712 SQL*Net roundtrips to/ client
0 sorts (memory)
0 sorts (disk)
10662 rows processed

可见,同样是HAO1和HAO2的全表扫描后的FILTER操纵,却因为传给子查询的输入的distinct value的差别,两者相差的consistent gets却如此重大,这跟neested loop是完全不一样的。
当然,对于如此的两个全表扫描的成果集,应用hash join是最佳办法。
SQL> hao1.object_id hao1 where exists
2 ( 1 hao2 where hao1.object_type=hao2.object_type);

10662 rows ed.

ution Plan
----------------------------------------------------------
Plan hash value: 3371915275

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10662 | 187K| 81 (4)| 00:00:01 |
|* 1 | HASH JOIN RIGHT SEMI| | 10662 | 187K| 81 (4)| 00:00:01 |
| 2 | TABLE ACCESS FULL | HAO2 | 10663 | 74641 | 40 (3)| 00:00:01 |
| 3 | TABLE ACCESS FULL | HAO1 | 10662 | 114K| 40 (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("HAO1"."OBJECT_TYPE"="HAO2"."OBJECT_TYPE")

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
985 consistent gets
0 physical reads
0 redo size
187491 bytes sent via SQL*Net to client
8302 bytes received via SQL*Net client
712 SQL*Net roundtrips to/ client
0 sorts (memory)
0 sorts (disk)
10662 rows processed

所以,什么时辰该用no_unnest使得子查询可以或许自力的履行完毕之后再跟外围的查询做FILTER?
起首,子查询的返回成果集应当较小,然后外围查询的输入的distinct value也应当较小(例如object_type)。

2.push_subq
若是说no_unnest是为了让子查询不展开,自力的完成,那么push_subq就是为了让子查询最进步前辈行join。
所以,这个hint其实是把握的join的次序。
例如某次在临盆库中碰到的一个SQL,简化一下然后模仿一下:
create table hao1 as * dba_objects;
create table hao2 as * dba_objects;
create table hao3 as * dba_objects;
create table hao4 as * dba_objects;

create index hao3idx on hao3(object_id);
(analyze all tables。)
hao1.object_name
hao1,hao2,hao4
where hao1.object_name like ""%a%""
and hao1.object_id+hao2.object_id>50
and hao4.object_type=hao1.object_type
and 11 in
(SELECT hao3.object_id FROM hao3 WHERE hao1.object_id = hao3.object_id);

对于如上的SQL,此中hao3和hao1在子查询中join,
很明显,若是先让hao1和hao3经由过程join,成果集估计只有一行,或者没有。
然则,此时CBO做出的履行规划为:
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 89077 | 3131K| 2070M (1)|999:59:59 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN | | 3234M| 108G| 289K (24)| 00:48:17 |
| 3 | TABLE ACCESS FULL | HAO4 | 36309 | 212K| 126 (3)| 00:00:02 |
| 4 | NESTED LOOPS | | 3296K| 94M| 224K (2)| 00:37:28 |
|* 5 | TABLE ACCESS FULL| HAO1 | 1816 | 47216 | 126 (3)| 00:00:02 |
|* 6 | TABLE ACCESS FULL| HAO2 | 1815 | 7260 | 124 (2)| 00:00:02 |
|* 7 | FILTER | | | | | |
|* 8 | INDEX RANGE SCAN | HAO3IDX | 1 | 4 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ */ 0 FROM "HAO3" "HAO3" WHERE 11=:B1
AND "HAO3"."OBJECT_ID"=11))
2 - access("HAO4"."OBJECT_TYPE"="HAO1"."OBJECT_TYPE")
5 - filter("HAO1"."OBJECT_NAME" LIKE ""%a%"")
6 - filter("HAO1"."OBJECT_ID"+"HAO2"."OBJECT_ID">50)
7 - filter(11=:B1)
8 - access("HAO3"."OBJECT_ID"=11)

由上可见,hao1和hao2,hao4进步前辈行无穷无尽的join之后,最后才跟hao3 join,这是很是坏的plan。
于是,我们hao1和hao3地点子查询先join,可以采取push_subq:
/*+push_subq(@tmp)*/ hao1.object_name
hao1,hao2,hao4
where hao1.object_name like ""%a%""
and hao1.object_id+hao2.object_id>50
and hao4.object_type=hao1.object_type
and 11 in
(SELECT /*+QB_Name(tmp)*/ hao3.object_id FROM hao3 WHERE hao1.object_id = hao3.object_id);

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 161M| 5552M| 14749 (24)| 00:02:28 |
|* 1 | HASH JOIN | | 161M| 5552M| 14748 (24)| 00:02:28 |
| 2 | TABLE ACCESS FULL | HAO4 | 36309 | 212K| 126 (3)| 00:00:02 |
| 3 | NESTED LOOPS | | 164K| 4828K| 11386 (2)| 00:01:54 |
|* 4 | TABLE ACCESS FULL | HAO1 | 91 | 2366 | 126 (3)| 00:00:02 |
|* 5 | FILTER | | | | | |
|* 6 | INDEX RANGE SCAN| HAO3IDX | 1 | 4 | 1 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | HAO2 | 1815 | 7260 | 124 (2)| 00:00:02 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("HAO4"."OBJECT_TYPE"="HAO1"."OBJECT_TYPE")
4 - filter("HAO1"."OBJECT_NAME" LIKE ""%a%"" AND EXISTS (SELECT /*+
PUSH_SUBQ QB_NAME ("TMP") */ 0 FROM "HAO3" "HAO3" WHERE 11=:B1 AND
"HAO3"."OBJECT_ID"=11))
5 - filter(11=:B1)
6 - access("HAO3"."OBJECT_ID"=11)
7 - filter("HAO1"."OBJECT_ID"+"HAO2"."OBJECT_ID">50)

加上hint后,SQL会在1秒以内完成。

3.push_pred
在谈到push_pred这个hint时,起首要搞清楚mergeable view和unmergeable view的差别。
这个在concept上有明白申明:
Mergeable and Unmergeable ViewsThe optimizer can merge a view into a referencing query block when the view has one or more base tables, provided the view does not contain:

    • set operators (UNION, UNION ALL, INTERSECT, MINUS)

真谛,哪怕只见到一线,我们也不克不及让它的光辉变得暗淡。

    • a CONNECT BY clause


  • a ROWNUM pseudocolumn
  • aggregate functions (AVG, COUNT, MAX, MIN, SUM) in the list

When a view contains one of the following structures, it can be merged into a referencing query block only if complex view merging is enabled (as described below):

  • a GROUP BY clause
  • a DISTINCT operator in the list

View merging is not possible for a view that has multiple base tables if it is on the right side of an outer join. If a view on the right side of an outer join has only one base table, however, the optimizer can use complex view merging even if an expression in the view can return a non-null value for a NULL. See "Views in Outer Joins" for more information.
这里在最后,我们发明一个unmergeable view的一种景象就是view在outer join的右侧。
对于这种景象,我们熟知的merge hint也无效。
例如:
create or replace view haoview as
hao1.* hao1,hao2
where hao1.object_id=hao2.object_id;

那么对于如许一个简单的查询,可见谓词hao3.object_name=haoview.object_name被merge到了view中:
hao3.object_name
hao3,haoview
where hao3.object_name=haoview.object_name
and hao3.object_id=999;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 129 (3)| 00:00:02 |
| 1 | NESTED LOOPS | | 1 | 44 | 129 (3)| 00:00:02 |
|* 2 | HASH JOIN | | 1 | 40 | 128 (3)| 00:00:02 |
| 3 | TABLE ACCESS BY INDEX ROWID| HAO3 | 1 | 20 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | HAO3IDX | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | HAO1 | 36311 | 709K| 125 (2)| 00:00:02 |
|* 6 | INDEX RANGE SCAN | HAO2IDX | 1 | 4 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("HAO3"."OBJECT_NAME"="HAO1"."OBJECT_NAME")
4 - access("HAO3"."OBJECT_ID"=999)
6 - access("HAO1"."OBJECT_ID"="HAO2"."OBJECT_ID")

接着,我把haoview放到outer join的右侧,这是haoview就属于unmergeable view了,优化器默认无法将谓词merge进这个haoview中,于是就看到了haoview零丁先履行:
hao3.object_name
hao3,haoview
where hao3.object_name=haoview.object_name(+)
and hao3.object_id=999;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 86 | 153 (5)| 00:00:02 |
|* 1 | HASH JOIN OUTER | | 1 | 86 | 153 (5)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| HAO3 | 1 | 20 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | HAO3IDX | 1 | | 1 (0)| 00:00:01 |
| 4 | VIEW | HAOVIEW | 36309 | 2340K| 150 (4)| 00:00:02 |
|* 5 | HASH JOIN | | 36309 | 850K| 150 (4)| 00:00:02 |
| 6 | INDEX FAST FULL SCAN | HAO2IDX | 36309 | 141K| 22 (5)| 00:00:01 |
| 7 | TABLE ACCESS FULL | HAO1 | 36311 | 709K| 125 (2)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("HAO3"."OBJECT_NAME"="HAOVIEW"."OBJECT_NAME"(+))
3 - access("HAO3"."OBJECT_ID"=999)
5 - access("HAO1"."OBJECT_ID"="HAO2"."OBJECT_ID")

接着,我们来应用这里的hint push_pred强迫优化器将谓词merge进view中,可见到“VIEW PUSHED PREDICATE”:
/*+push_pred(haoview)*/ hao3.object_name
hao3,haoview
where hao3.object_name=haoview.object_name(+)
and hao3.object_id=999;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 40 | 128 (2)| 00:00:02 |
| 1 | NESTED LOOPS OUTER | | 1 | 40 | 128 (2)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| HAO3 | 1 | 36 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | HAO3IDX | 1 | | 1 (0)| 00:00:01 |
| 4 | VIEW PUSHED PREDICATE | HAOVIEW | 1 | 4 | 126 (2)| 00:00:02 |
| 5 | NESTED LOOPS | | 1 | 24 | 126 (2)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | HAO1 | 1 | 20 | 125 (2)| 00:00:02 |
|* 7 | INDEX RANGE SCAN | HAO2IDX | 1 | 4 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("HAO3"."OBJECT_ID"=999)
6 - filter("HAO1"."OBJECT_NAME"="HAO3"."OBJECT_NAME")
7 - access("HAO1"."OBJECT_ID"="HAO2"."OBJECT_ID")

于是,会有同窗问,那么merge hint可否有同样的结果呢?答案是,对于这种unmergeable view来说,merge hint无效。
/*+merge(haoview)*/ hao3.object_name
hao3,haoview
where hao3.object_name=haoview.object_name(+)
and hao3.object_id=999;


----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 86 | 153 (5)| 00:00:02 |
|* 1 | HASH JOIN OUTER | | 1 | 86 | 153 (5)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| HAO3 | 1 | 20 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | HAO3IDX | 1 | | 1 (0)| 00:00:01 |
| 4 | VIEW | HAOVIEW | 36309 | 2340K| 150 (4)| 00:00:02 |
|* 5 | HASH JOIN | | 36309 | 850K| 150 (4)| 00:00:02 |
| 6 | INDEX FAST FULL SCAN | HAO2IDX | 36309 | 141K| 22 (5)| 00:00:01 |
| 7 | TABLE ACCESS FULL | HAO1 | 36311 | 709K| 125 (2)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("HAO3"."OBJECT_NAME"="HAOVIEW"."OBJECT_NAME"(+))
3 - access("HAO3"."OBJECT_ID"=999)
5 - access("HAO1"."OBJECT_ID"="HAO2"."OBJECT_ID")

可见,对于此种身处outger join右侧的view来说,merge hint已经力所不及了。

综上,对于大师斗劲轻易混合的三个hint:
no_unnest/unnest是针对子查询是否展开的,push_subq是针对子查询的连接次序的,push_pred则是针对unmergeable view应用外部查询谓词。

<think>我们正在优化一个PostgreSQL查询,该查询的目标是从表"tb_push_data"中提取非重复的公司代码。这些公司码存储在一个以逗号分隔的字符串列(company_code)中。 原始查询: SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM "tb_push_data" WHERE push_type='0' and deleted='0' 优化思路: 1. 由于company_code是逗号分隔的字符串,我们需要先将其拆分成数组,然后unnest展开成多行,最后去重。 2. 问题:这个查询可能会因为以下原因效率较低: a. 需要处理每一行,将字符串分割成数组,然后展开,这是一个计算密集型操作。 b. 如果表很大,WHERE条件筛选后的数据量仍然很大,那么unnest和去重的操作开销会很大。 3. 优化方向: a. 考虑在WHERE条件上创建索引,以加快筛选速度(push_type和deleted)。 b. 考虑是否可以将company_code的存储方式改为数组类型或者使用关系表(规范化),这样查询效率会更高。但这属于模式变更,需要权衡。 c. 如果无法改变表结构,我们可以尝试减少需要处理的数据量(例如,通过更精确的筛选条件)。 d. 另外,DISTINCT操作可能会消耗较多内存,如果结果集很大,可以考虑使用其他方法。 然而,我们注意到原始查询已经使用了WHERE条件(push_type='0' and deleted='0')来限制数据。因此,我们首先应该确保这两个字段有索引。 建议的优化步骤: 步骤1:创建索引 如果还没有索引,在push_type和deleted上创建索引(或者复合索引): ```sql CREATE INDEX idx_tb_push_data_push_type_deleted ON tb_push_data(push_type, deleted); ``` 步骤2:如果数据量非常大,考虑使用LATERAL连接来拆分字符串,并尽早去重(但注意,DISTINCT是最后执行的操作,所以可能效果有限)。我们可以尝试改写查询,让拆分的行数尽可能少(即先过滤再拆分)。 实际上,原始查询已经是先过滤再拆分。但是,由于拆分后行数会变多,然后再去重,这可能会导致中间结果很大。 另一种写法是使用子查询先过滤,然后再拆分: ```sql SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM (SELECT company_code FROM tb_push_data WHERE push_type='0' and deleted='0') AS filtered_data; ``` 这种写法逻辑上一样,但可能不会带来性能提升。 步骤3:考虑使用递归查询或者生成系列函数,但在这里不适用。 步骤4:如果同一个company_code字符串在表中重复出现多次,我们可以考虑先对company_code去重(在拆分前),然后再拆分并去重。这样可能会减少需要处理的数据量。 例如: ```sql SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM (SELECT DISTINCT company_code FROM tb_push_data WHERE push_type='0' and deleted='0') AS distinct_codes; ``` 这个查询先对company_code列本身去重(即相同的字符串只保留一次),然后再拆分每一个字符串并去重。如果不同的行经常包含相同的公司代码字符串,那么这样做可以减少重复拆分的次数。 但是注意:两个不同的字符串可能包含相同的公司代码(例如,"A,B"和"B,A"),但这种情况下去重后拆分可能会漏掉一些组合?实际上不会,因为拆分后都是单个公司代码,然后整体再去重。所以这个优化是可行的。 然而,这个优化也有风险:如果两个不同的字符串包含的公司代码完全不同,那么先对字符串去重就没有减少数据量;如果有很多重复的字符串,那么就会减少。 步骤5:使用CTE(公用表表达式)来组织查询,但性能上不一定有提升。 步骤6:如果公司代码的字符串非常长(包含很多公司代码),那么拆分操作会很耗时。我们可以尝试在应用层处理,但不符合SQL优化的要求。 步骤7:考虑使用并行查询。PostgreSQL默认会尝试并行执行,确保数据库的配置允许并行,并且工作线程足够。 步骤8:如果数据量巨大且更新不频繁,可以考虑物化视图。 步骤9:另一个思路是,将拆分后的公司代码存储到另一个表中,并定期更新(比如通过触发器或定时任务)。这样查询就会直接从一个规范化的表里查,性能会很好。 根据实际情况选择优化方案。 总结:在没有改变表结构的情况下,我们可以尝试以下优化: 优化方案A(创建索引): ```sql CREATE INDEX idx_tb_push_data_push_type_deleted ON tb_push_data(push_type, deleted); ``` 优化方案B(先对company_code字符串去重): ```sql SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM ( SELECT DISTINCT company_code FROM tb_push_data WHERE push_type='0' and deleted='0' ) AS distinct_codes; ``` 我们可以测试两种方案,看看哪种效果更好。 注意:在优化方案B中,内层的DISTINCT减少的是相同字符串的数量,然后外层再拆分并去重。如果原始数据中company_code重复率很高,那么内层DISTINCT会大大减少数据量,从而提升性能。如果重复率很低,则可能增加一层DISTINCT操作(需要哈希或排序)的开销,反而可能变慢。 因此,优化方案B需要根据数据分布情况而定。 另外,我们还可以尝试使用WITH语句(CTE)来写优化方案B,但效果一样。 优化方案C(使用正则表达式拆分,但可能不如string_to_array高效,因此不推荐)。 实际测试:建议在实际环境中对原始查询和优化后的查询进行EXPLAIN ANALYZE,比较执行计划。 其他建议:长期来看,建议将company_code列改为数组类型(如果PostgreSQL版本支持)或者使用关系表(多对多关系)来存储。这样就可以直接使用数组操作或者连接查询,并且可以建立索引(比如GIN索引)来加速查询。 举例,如果使用数组类型,可以这样: ```sql -- 先将列改为数组类型 ALTER TABLE tb_push_data ALTER COLUMN company_code TYPE text[] USING string_to_array(company_code, ','); -- 然后查询可以改为(使用unnest): SELECT DISTINCT unnest(company_code) FROM tb_push_data WHERE push_type='0' and deleted='0'; -- 或者,为了更快,可以创建一个GIN索引: CREATE INDEX idx_company_code_gin ON tb_push_data USING gin (company_code); -- 但是注意:这样并不能直接加速DISTINCT unnest,因为unnest仍然要展开所有行。另一种方法是使用数组的元素生成一个单独的表(通过触发器或物化视图)。 由于改变表结构可能影响其他应用,需要谨慎。 综上所述,我们给出两个短期的优化建议(索引和改写查询),以及一个长期的建议(改变表结构)。 </think>### SQL查询优化分析 您的查询目标是提取`tb_push_data`表中`company_code`字段(逗号分隔的字符串)拆分后的唯一值。原始查询: ```sql SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM "tb_push_data" WHERE push_type='0' AND deleted='0' ``` #### 优化建议: 1. **索引优化**: - 在筛选列创建复合索引加速数据过滤: ```sql CREATE INDEX idx_push_filter ON tb_push_data(push_type, deleted); ``` - 若`company_code`值重复率高,考虑表达式索引: ```sql CREATE INDEX idx_company_codes ON tb_push_data (regexp_split_to_table(company_code, ',')); ``` 2. **减少中间结果集体积**: ```sql WITH filtered AS ( SELECT company_code FROM tb_push_data WHERE push_type='0' AND deleted='0' ) SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM filtered; ``` 3. **表结构优化建议**: - **长期方案**:将逗号分隔的数据转为关系表(1NF): ```sql -- 新建关联表 CREATE TABLE company_push_rel ( push_id INT REFERENCES tb_push_data(id), company_code VARCHAR(32) ); -- 迁移现有数据 INSERT INTO company_push_rel SELECT id, unnest(string_to_array(company_code, ',')) FROM tb_push_data; ``` - 查询优化为(无需字符串处理): ```sql SELECT DISTINCT company_code FROM company_push_rel WHERE push_id IN ( SELECT id FROM tb_push_data WHERE push_type='0' AND deleted='0' ); ``` 4. **函数调用优化**: PostgreSQL 14+ 可使用`STRICT`函数避免空值处理: ```sql SELECT DISTINCT unnest(COALESCE(string_to_array(company_code, ','), ARRAY[]::text[])) ... ``` #### 性能对比验证: 执行计划检查命令: ```sql EXPLAIN ANALYZE SELECT DISTINCT unnest(string_to_array(company_code, ',')) FROM tb_push_data WHERE push_type='0' AND deleted='0'; ``` --- ### 相关问题 1. 如何分析PostgreSQL查询的执行计划? 2. 什么场景下适合使用`unnest()`函数? 3. 数据库表结构违反第一范式(1NF)会带来哪些问题? 4. PostgreSQL中数组类型和JSONB类型在处理结构化数据时有何优劣? 5. 如何监控SQL查询的实时性能表现?[^1] [^1]: 数据库性能监控可参考pg_stat_statements扩展文档
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值