半联结是在两个数据集(表)之间的联结,其中第一个数据集中的数据行在决定是否返回时会根据在另一个数据集中出现或不出现至少一个相匹配的数据行来确定。“不出先”匹配行——这是半联结的一种特殊形式,称为反联结。
标准的内联结与半联结之间最主要的区别在于在半联结中,第1个数据集中的每一条记录至返回一次,而不管在第二个数据集中有几条匹配的数据。这个定义表明这个查询的实际处理过程可以通过在找到第1个匹配以后马上停止第二个查询来进行优化。
从本质上来说,这就是半联结:可以通过在查询2执行完成之前停止处理该查询来进行优化。这种联结技术在oracle基于成本的优化器中查询中又包含在IN或EXISTS句中的子查询时是一种可选方案。
1、半联结
/*+ SEMIJOIN */ 进行半联结(优化器选用使用哪种类型)。
/*+ NO_SEMIJOIN */ 显示的意味着不行进半联结。
SQL> select * from emp order by job;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- -------------- ---------- ---------- ----------
7788 SCOTT ANALYST 7566 19-4月 -87 3000 20
7902 FORD ANALYST 7566 03-12月-81 3000 20
7934 MILLER CLERK 7782 23-1月 -82 1300 10
7900 JAMES CLERK 7698 03-12月-81 950 30
7369 SMITH CLERK 7902 17-12月-80 800 20
7876 ADAMS CLERK 7788 23-5月 -87 1100 20
7698 BLAKE MANAGER 7839 01-5月 -81 2850 30
7566 JONES MANAGER 7839 02-4月 -81 2975 20
7782 CLARK MANAGER 7839 09-6月 -81 2450 10
7839 KING PRESIDENT 17-11月-81 5000 10
7844 TURNER SALESMAN 7698 08-9月 -81 1500 0 30
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- -------------- ---------- ---------- ----------
7654 MARTIN SALESMAN 7698 28-9月 -81 1250 1400 30
7521 WARD SALESMAN 7698 22-2月 -81 1250 500 30
7499 ALLEN SALESMAN 7698 20-2月 -81 1600 300 30
已选择14行。
分析上面的数据可以看出JOB='SALESMAN'的DEPTNO有4条重复的记录,值都是30。那么如下的子查询返回的记录就有重复的记录!
select e.ename, e.job, e.sal
from emp e
where e.deptno in (select d.deptno from emp d where d.job = 'SALESMAN');
select e.ename, e.job, e.sal
from emp e
where exists (select null
from emp d
where e.deptno = d.deptno
and d.job = 'SALESMAN');
看执行计划:
SQL> select e.ename, e.job, e.sal
2 from emp e
3 where e.deptno in (select d.deptno from emp d where d.job = 'SALESMAN');
ENAME JOB SAL
---------- --------- ----------
JAMES CLERK 950
TURNER SALESMAN 1500
BLAKE MANAGER 2850
MARTIN SALESMAN 1250
WARD SALESMAN 1250
ALLEN SALESMAN 1600
已选择6行。
执行计划
----------------------------------------------------------
Plan hash value: 977554918
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 448 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 14 | 448 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 294 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 3 | 33 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPTNO"="D"."DEPTNO")
3 - filter("D"."JOB"='SALESMAN')
SQL>
SQL>
SQL> select e.ename, e.job, e.sal
2 from emp e
3 where exists (select null
4 from emp d
5 where e.deptno = d.deptno
6 and d.job = 'SALESMAN');
ENAME JOB SAL
---------- --------- ----------
JAMES CLERK 950
TURNER SALESMAN 1500
BLAKE MANAGER 2850
MARTIN SALESMAN 1250
WARD SALESMAN 1250
ALLEN SALESMAN 1600
已选择6行。
执行计划
----------------------------------------------------------
Plan hash value: 977554918
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 448 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 14 | 448 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 294 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 3 | 33 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPTNO"="D"."DEPTNO")
3 - filter("D"."JOB"='SALESMAN')
使用IN和EXISTS的执行计划是一致的,这一点就为了消除长期以来人们认为的使用EXISTS查询与使用IN查询的处理方法很不相同这种观点。注意执行计划中的“SEMI”就代表使用了半联结!
SQL> select e.ename, e.job, e.sal
2 from emp e
3 where e.deptno in (select /*+ no_semijoin */
4 d.deptno
5 from emp d
6 where d.job = 'SALESMAN');
ENAME JOB SAL
---------- --------- ----------
ALLEN SALESMAN 1600
WARD SALESMAN 1250
MARTIN SALESMAN 1250
BLAKE MANAGER 2850
TURNER SALESMAN 1500
JAMES CLERK 950
已选择6行。
执行计划
----------------------------------------------------------
Plan hash value: 2389097100
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 306 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 9 | 306 | 8 (25)| 00:00:01 |
| 2 | VIEW | VW_NSO_1 | 2 | 26 | 4 (25)| 00:00:01 |
| 3 | HASH UNIQUE | | 2 | 22 | 4 (25)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| EMP | 3 | 33 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 14 | 294 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPTNO"="DEPTNO")
4 - filter("D"."JOB"='SALESMAN')
SQL>
SQL>
SQL> select e.ename, e.job, e.sal
2 from emp e
3 where exists (select /*+ no_semijoin */
4 null
5 from emp d
6 where e.deptno = d.deptno
7 and d.job = 'SALESMAN');
ENAME JOB SAL
---------- --------- ----------
ALLEN SALESMAN 1600
WARD SALESMAN 1250
MARTIN SALESMAN 1250
BLAKE MANAGER 2850
TURNER SALESMAN 1500
JAMES CLERK 950
已选择6行。
执行计划
----------------------------------------------------------
Plan hash value: 2561671593
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 105 | 12 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| EMP | 14 | 294 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 1 | 11 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "EMP" "D" WHERE
"D"."JOB"='SALESMAN' AND "D"."DEPTNO"=:B1))
3 - filter("D"."JOB"='SALESMAN' AND "D"."DEPTNO"=:B1)
使用NO_SEMIJOIN提示关闭了优化器使用半联结的能力。可以看见使用使用IN和EXISTS的执行计划还不一样了。
2、反联结
/*+ ANTIJOIN */ 进行反联接(优化器决定具体类型)
/*+ USE_ANTI */ ANTIJOIN提示的旧版本
SQL> select e.ename, e.job, e.sal
2 from emp e
3 where e.deptno not in
4 (select d.deptno from emp d where d.job = 'PRESIDENT');
ENAME JOB SAL
---------- --------- ----------
JAMES CLERK 950
TURNER SALESMAN 1500
BLAKE MANAGER 2850
MARTIN SALESMAN 1250
WARD SALESMAN 1250
ALLEN SALESMAN 1600
FORD ANALYST 3000
ADAMS CLERK 1100
SCOTT ANALYST 3000
JONES MANAGER 2975
SMITH CLERK 800
已选择11行。
执行计划
----------------------------------------------------------
Plan hash value: 4002838083
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 160 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN ANTI NA | | 5 | 160 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 294 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 3 | 33 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPTNO"="D"."DEPTNO")
3 - filter("D"."JOB"='PRESIDENT')
SQL>
SQL>
SQL> select e.ename, e.job, e.sal
2 from emp e
3 where not exists (select null
4 from emp d
5 where e.deptno = d.deptno
6 and d.job = 'PRESIDENT');
ENAME JOB SAL
---------- --------- ----------
JAMES CLERK 950
TURNER SALESMAN 1500
BLAKE MANAGER 2850
MARTIN SALESMAN 1250
WARD SALESMAN 1250
ALLEN SALESMAN 1600
FORD ANALYST 3000
ADAMS CLERK 1100
SCOTT ANALYST 3000
JONES MANAGER 2975
SMITH CLERK 800
已选择11行。
执行计划
----------------------------------------------------------
Plan hash value: 3353202012
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 160 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 5 | 160 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 294 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 3 | 33 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPTNO"="D"."DEPTNO")
3 - filter("D"."JOB"='PRESIDENT')
执行计划中的“ANTI” 就是反联结的意思,而在NOT IN的执行计划中还有“NA” 表示考虑NULL值的意思。