In和exist,not in和not exists,其中可能还有null,一直是个很纠结的问题,直到现在自己也不能完全说出两种语句的优劣,使用情况等。
先从查询结果上来讲,上个星期五pub的一篇帖子说到了not in和not exists查询结果不同,记得曾经碰到过这种问题,null的影响,因为not in和not exists两种从根本上”算法”是不同的。
下面自己做个了测试的例子可以很清晰的看到结果不同
SQL> select * from jj_one;
ID NAME
------ ----------------------------------------
1 as
2 ad
kj
1 as
SQL> select * from jj_two;
COL1 COL2
---------- ----------
1 as
3 df
SQL> select * from jj_one where id not in (select col1 from jj_two);
ID NAME
------ ----------------------------------------
2 ad
SQL> select * from jj_one a where not exists (select 1 from jj_two b where a.id=b.col1);
ID NAME
------ ----------------------------------------
kj
2 ad
Null的运算只能是无法用于等值判断的,null和任何值比较包括null都是false,一般null的判断是is null或is not null。需要明确下in于null值判断是返回false的。
in与not in也就是跟一系列范围的值做等值判断,所以会略去相应的null值的一行,而not exists是一种逻辑判断是否存在,其实这个not exists可以理解为:JJ_one和JJ_two等值连接然后由于存在null不符合要求不会返回结果集,而外部存在一个not exists判断是否真的不存在这种做等值连接不返回结果集的数据,存在即返回结果相应数据行!
还有一个测试例子:
SQL> select * from jj_one;
ID NAME
------ ----------------------------------------
1 as
2 ad
kj
1 as
SQL> select * from jj_two;
COL1 COL2
---------- ----------
1 as
3 df
ok
SQL> select * from jj_one where id in (select col1 from jj_two);
ID NAME
------ ----------------------------------------
1 as
1 as
SQL> select * from jj_one a where exists (select 1 from jj_two b where a.id=b.col1);
ID NAME
------ ----------------------------------------
1 as
1 as
SQL> select * from jj_one a where not exists (select 1 from jj_two b where a.id=b.col1);
ID NAME
------ ----------------------------------------
kj
2 ad
In和exists性能上也有一定区别,但是并不能简单的认为exists一定优于exists,还是要具体情况具体分析。下面有个not in和not exists的执行计划。
SQL> select id from jj_one where not exists(select 1 from jj_two where id=col1);
执行计划
----------------------------------------------------------
Plan hash value: 2332573458
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 48 | 6 (17)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 3 | 48 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS FULL| JJ_ONE | 4 | 12 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| JJ_TWO | 1 | 13 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"="COL1")
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
439 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> select id from jj_one where id not in (select col1 from jj_two);
未选定行
执行计划
----------------------------------------------------------
Plan hash value: 2272190419
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 6 | 7 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| JJ_ONE | 4 | 12 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| JJ_TWO | 1 | 13 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "JJ_TWO" "JJ_TWO" WHERE
LNNVL("COL1"<>:B1)))
3 - filter(LNNVL("COL1"<>:B1))
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
25 consistent gets
0 physical reads
0 redo size
268 bytes sent via SQL*Net to client
374 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
可以看出来无论是逻辑读还是cost至少在not exists上是优于not in的,cbo模式下not in 采用的filter对外层表做全表扫描再去filter内层查询结果集,not exists采用内存查询结果集作hash连接,而hash连接显而易见会取得更好的结果!
这里自己也想起来以前的工作的一个拉数据的例子,两张表inf_apply的表大概有20w以上数据主键no,inf_apply_test大概有3w数据,想实现把inf_apply表中更新no不在inf_apply_test表中no记录的inf_apply表中的相应数据,刚开始同事一个not in十几分钟还没有反应。最好改成not exists只要一分钟左右的样子就可以了!