in，exists和not exists ，not in与null的一些关系记载

最新推荐文章于 2023-10-08 02:06:34 发布

转载最新推荐文章于 2023-10-08 02:06:34 发布 · 3.8k 阅读

文章标签：

#null #filter #access #sql #table #disk

本文通过测试实例详细解释了SQL查询中IN和EXISTS的区别，特别是它们如何处理NULL值，并且提供了关于两种操作符在性能上的对比分析。

In和exist，not in和not exists，其中可能还有null，一直是个很纠结的问题，直到现在自己也不能完全说出两种语句的优劣，使用情况等。

先从查询结果上来讲，上个星期五pub的一篇帖子说到了not in和not exists查询结果不同，记得曾经碰到过这种问题，null的影响，因为not in和not exists两种从根本上”算法”是不同的。

下面自己做个了测试的例子可以很清晰的看到结果不同

SQL> select * from jj_one;

ID NAME

------ ----------------------------------------

1 as

2 ad

1 as

SQL> select * from jj_two;

COL1 COL2

---------- ----------

1 as

3 df

SQL> select * from jj_one where id not in (select col1 from jj_two);

ID NAME

------ ----------------------------------------

2 ad

SQL> select * from jj_one a where not exists (select 1 from jj_two b where a.id=b.col1);

ID NAME

------ ----------------------------------------

2 ad

Null的运算只能是无法用于等值判断的，null和任何值比较包括null都是false，一般null的判断是is null或is not null。需要明确下in于null值判断是返回false的。

in与not in也就是跟一系列范围的值做等值判断，所以会略去相应的null值的一行，而not exists是一种逻辑判断是否存在，其实这个not exists可以理解为：JJ_one和JJ_two等值连接然后由于存在null不符合要求不会返回结果集，而外部存在一个not exists判断是否真的不存在这种做等值连接不返回结果集的数据，存在即返回结果相应数据行！

还有一个测试例子：

SQL> select * from jj_one;

ID NAME

------ ----------------------------------------

1 as

2 ad

1 as

SQL> select * from jj_two;

COL1 COL2

---------- ----------

1 as

3 df

SQL> select * from jj_one where id in (select col1 from jj_two);

ID NAME

------ ----------------------------------------

1 as

SQL> select * from jj_one a where exists (select 1 from jj_two b where a.id=b.col1);

ID NAME

------ ----------------------------------------

1 as

SQL> select * from jj_one a where not exists (select 1 from jj_two b where a.id=b.col1);

ID NAME

------ ----------------------------------------

2 ad

In和exists性能上也有一定区别，但是并不能简单的认为exists一定优于exists，还是要具体情况具体分析。下面有个not in和not exists的执行计划。

SQL> select id from jj_one where not exists(select 1 from jj_two where id=col1);

执行计划

----------------------------------------------------------

Plan hash value: 2332573458

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3 | 48 | 6 (17)| 00:00:01 |

|* 1 | HASH JOIN ANTI | | 3 | 48 | 6 (17)| 00:00:01 |

| 2 | TABLE ACCESS FULL| JJ_ONE | 4 | 12 | 3 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| JJ_TWO | 1 | 13 | 2 (0)| 00:00:01 |

-----------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("ID"="COL1")

统计信息

----------------------------------------------------------

1 recursive calls

0 db block gets

14 consistent gets

0 physical reads

0 redo size

439 bytes sent via SQL*Net to client

385 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

SQL> select id from jj_one where id not in (select col1 from jj_two);

未选定行

执行计划

----------------------------------------------------------

Plan hash value: 2272190419

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 2 | 6 | 7 (0)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | TABLE ACCESS FULL| JJ_ONE | 4 | 12 | 3 (0)| 00:00:01 |

|* 3 | TABLE ACCESS FULL| JJ_TWO | 1 | 13 | 2 (0)| 00:00:01 |

-----------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "JJ_TWO" "JJ_TWO" WHERE

LNNVL("COL1"<>:B1)))

3 - filter(LNNVL("COL1"<>:B1))

统计信息

----------------------------------------------------------

1 recursive calls

0 db block gets

25 consistent gets

0 physical reads

0 redo size

268 bytes sent via SQL*Net to client

374 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

0 rows processed

可以看出来无论是逻辑读还是cost至少在not exists上是优于not in的，cbo模式下not in 采用的filter对外层表做全表扫描再去filter内层查询结果集，not exists采用内存查询结果集作hash连接，而hash连接显而易见会取得更好的结果！

这里自己也想起来以前的工作的一个拉数据的例子，两张表inf_apply的表大概有20w以上数据主键no，inf_apply_test大概有3w数据，想实现把inf_apply表中更新no不在inf_apply_test表中no记录的inf_apply表中的相应数据，刚开始同事一个not in十几分钟还没有反应。最好改成not exists只要一分钟左右的样子就可以了！