如何优化带or条件的sql

最新推荐文章于 2024-10-07 09:00:00 发布

转载最新推荐文章于 2024-10-07 09:00:00 发布 · 2.8k 阅读

文章标签：

#优化 #sql #access #filter #file #concatenation

Database 专栏收录该内容

6 篇文章

订阅专栏

本文探讨了在Oracle数据库中如何通过创建不同的索引来优化复杂查询的性能，并对比了使用UNION和UNION ALL关键字时的不同执行效果。

以下为转帖内容：

================================================================================================

今天在论坛上看到了一个帖子，问题如下：

select * from cc
where ((a1 ='ffff' and z1='mmmm') or (b1='sss' and z2='nnnn'))
and c1 ='ggggg'
其中表有30万行数据，返回的数据10行左右，怎样创建index访问最快。

按照别人的说法测试了一下，步骤如下：

create table CC
(
A1 VARCHAR2(5),
Z1 VARCHAR2(5),
B1 VARCHAR2(5),
Z2 VARCHAR2(5),
C1 VARCHAR2(5)
)

insert into cc values('dffd','dfsd','fdf','fdsfs','sfds');--重复插入2097152条，对查询时间可能有影响

SQL> select count(*) from cc;

COUNT(*)
----------
2097160

SQL> set timing on
SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where ((a1='ffff'and z1='mmmm') or (b1='sss' and z2='nnnn'))
3* and c1='ggggg'--无索引情况下or查询
SQL> /

A1    Z1    B1    Z2    C1
----- ----- ----- ----- -----
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg
ffff mmmmm sss   nnnn ggggg

已选择8行。

已用时间: 00: 00: 00.21
SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where (a1='ffff'and z1='mmmm')
3 and c1='ggggg'
4 union
5 select * from cc
6 where (b1='sss' and z2='nnnn')
7* and c1='ggggg'--无索引情况下union查询，注意与union all查询结果的区别
SQL> /

A1 Z1 B1 Z2 C1
----- ----- ----- ----- -----
ffff mmmmm sss nnnn ggggg

已用时间: 00: 00: 00.33
SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where (a1='ffff'and z1='mmmm')
3 and c1='ggggg'
4 union all
5 select * from cc
6 where (b1='sss' and z2='nnnn')
7* and c1='ggggg'--无索引情况下union all查询
SQL> /

已选择8行。

已用时间: 00: 00: 00.35
SQL> create index cc_idx on cc(c1);

索引已创建。

已用时间: 00: 00: 11.14
SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where ((a1='ffff'and z1='mmmm') or (b1='sss' and z2='nnnn'))
3* and c1='ggggg'--有索引or查询，注意，虽然没有列出执行计划，索引肯定用到了。

SQL> /

已选择8行。

已用时间: 00: 00: 00.01
SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where (a1='ffff'and z1='mmmm')
3 and c1='ggggg'
4 union
5 select * from cc
6 where (b1='sss' and z2='nnnn')
7* and c1='ggggg'--有索引union查询，注意与union all查询结果的区别
SQL> /

A1 Z1 B1 Z2 C1
----- ----- ----- ----- -----
ffff mmmmm sss nnnn ggggg

已用时间: 00: 00: 00.00
SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where (a1='ffff'and z1='mmmm')
3 and c1='ggggg'
4 union all
5 select * from cc
6 where (b1='sss' and z2='nnnn')
7* and c1='ggggg'--有索引union all查询
SQL> /

已选择8行。

已用时间: 00: 00: 00.01

SQL>create index CC_IDX2 on CC (A1, Z1);

SQL>create index CC_IDX3 on CC (B1, Z2);

SQL>set autot on

SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where ((a1='ffff'and z1='mmmm') or (b1='sss' and z2='nnnn'))
3* and c1='ggggg'--3索引情况下or查询

SQL> /

已选择8行。

已用时间: 00: 00: 00.60--时间明显比单索引扫描时间长

执行计划
----------------------------------------------------------
Plan hash value: 1540710700

---------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     2 |    40 |     4   (0)| 00:00:01 |
|   1 | CONCATENATION     |      |       |       |            |          |
|* 2 |   TABLE ACCESS FULL| CC   |     1 |    20 |     2   (0)| 00:00:01 |--注意：从执行计划上可以看到，
|* 3 |   TABLE ACCESS FULL| CC   |     1 |    20 |     2   (0)| 00:00:01 |--索引失效，全表扫描
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("B1"='sss' AND "Z2"='nnnn' AND "C1"='ggggg')
   3 - filter("C1"='ggggg' AND "A1"='ffff' AND "Z1"='mmmm' AND
              (LNNVL("B1"='sss') OR LNNVL("Z2"='nnnn')))

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      17673 consistent gets
        405 physical reads
          0 redo size
        703 bytes sent via SQL*Net to client
        400 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          8 rows processed

SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where (a1='ffff'and z1='mmmm')
3 and c1='ggggg'
4 union
5 select * from cc
6 where (b1='sss' and z2='nnnn')
7* and c1='ggggg'--3索引情况下union查询，注意与union all查询结果的区别
SQL> /

A1 Z1 B1 Z2 C1
----- ----- ----- ----- -----
ffff mmmmm sss nnnn ggggg

已用时间: 00: 00: 00.10--时间明显比单索引扫描时间长

执行计划
----------------------------------------------------------
Plan hash value: 1185376162

--------------------------------------------------------------------------------

---------

ime |

--------------------------------------------------------------------------------

---------

| 0 | SELECT STATEMENT | | 4 | 80 | 10 (60)| 0

0:00:01 |

| 1 | SORT UNIQUE | | 4 | 80 | 10 (60)| 0

0:00:01 |

| 2 | UNION-ALL | | | | |
|

|* 3 | TABLE ACCESS BY INDEX ROWID| CC | 2 | 40 | 4 (0)| 0

0:00:01 |

|* 4 | INDEX RANGE SCAN | CC_IDX2 | 34 | | 3 (0)| 0

0:00:01 |

|* 5 | TABLE ACCESS BY INDEX ROWID| CC | 2 | 40 | 4 (0)| 0

0:00:01 |

|* 6 | INDEX RANGE SCAN | CC_IDX3 | 34 | | 3 (0)| 0

0:00:01 |

--------------------------------------------------------------------------------

---------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("C1"='ggggg')
   4 - access("A1"='ffff' AND "Z1"='mmmm')
   5 - filter("C1"='ggggg')
   6 - access("B1"='sss' AND "Z2"='nnnn')

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
          9 recursive calls
          0 db block gets
        174 consistent gets
          7 physical reads
          0 redo size
        637 bytes sent via SQL*Net to client
        400 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          1 sorts (memory)--注意，进行了排序
          0 sorts (disk)
          1 rows processed

SQL> edit
已写入 file afiedt.buf

1 select * from cc
2 where (a1='ffff'and z1='mmmm')
3 and c1='ggggg'
4 union all
5 select * from cc
6 where (b1='sss' and z2='nnnn')
7* and c1='ggggg'--3索引情况下union all查询
SQL> /

已选择8行。

已用时间: 00: 00: 00.06--时间明显比单索引扫描时间长

执行计划
----------------------------------------------------------
Plan hash value: 198920981

--------------------------------------------------------------------------------

--------

me |

--------------------------------------------------------------------------------

--------

| 0 | SELECT STATEMENT | | 4 | 80 | 8 (50)| 00

:00:01 |

| 1 | UNION-ALL | | | | |
|

|* 2 | TABLE ACCESS BY INDEX ROWID| CC | 2 | 40 | 4 (0)| 00

:00:01 |

|* 3 | INDEX RANGE SCAN | CC_IDX2 | 34 | | 3 (0)| 00

:00:01 |

|* 4 | TABLE ACCESS BY INDEX ROWID| CC | 2 | 40 | 4 (0)| 00

:00:01 |

|* 5 | INDEX RANGE SCAN | CC_IDX3 | 34 | | 3 (0)| 00

:00:01 |

--------------------------------------------------------------------------------

--------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("C1"='ggggg')
   3 - access("A1"='ffff' AND "Z1"='mmmm')
   4 - filter("C1"='ggggg')
   5 - access("B1"='sss' AND "Z2"='nnnn')

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
          7 recursive calls
          0 db block gets
        175 consistent gets
          0 physical reads
          0 redo size
        703 bytes sent via SQL*Net to client
        400 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          8 rows processed

SQL>

在数据库中，UNION和UNION ALL关键字都是将两个结果集合并为一个，但这两者从使用和效率上来说都有所不同。

UNION在进行表链接后会筛选掉重复的记录，所以在表链接后会对所产生的结果集进行排序运算，删除重复的记录再返回结果。

实际大部分应用中是不会产生重复的记录，最常见的是过程表与历史表UNION。如：

select * from gc_dfys

union

select * from ls_jg_dfys

这个SQL在运行时先取出两个表的结果，再用排序空间进行排序删除重复的记录，最后返回结果集，如果表数据量大的话可能会导致用磁盘进行排序。

而UNION ALL只是简单的将两个结果合并后就返回。这样，如果返回的两个结果集中有重复的数据，那么返回的结果集就会包含重复的数据了。

从效率上说，UNION ALL 要比UNION快很多，所以，如果可以确认合并的两个结果集中不包含重复的数据的话，那么就使用UNION ALL。

这个从论坛上摘下来的：emp表比较大时，而且deptno = 10条件能查询出表中大部分的数据如(50%)。如该表共有4000万行数据，共放在有500000个数据块中，每个数据块为8k，则该表共有约4G，则这么多的数据不可能全放在内存中，绝大多数需要放在硬盘上。此时如果该查询通过索引查询，则是你梦魇的开始。db_file_multiblock_read_count参数的值200。如果采用全表扫描，则需要500000/db_file_multiblock_read_count=500000/200=2500次I/O。但是如果采用索引扫描，假设deptno列上的索引都已经cache到内存中，所以可以将访问索引的开销忽略不计。因为要读出4000万x 50% = 2000万数据，假设在读这2000万数据时，有99.9%的命中率，则还是需要20000次I/O,比上面的全表扫描需要的2500次多多了，所以在这种情况下，用索引扫描反而性能会差很多。在这样的情况下，用全表扫描的时间是固定的，但是用索引扫描的时间会随着选出数据的增多使查询时间相应的延长。

建立合适的索引，写适当的语句，才能达到最优。