1 问题思考:
在DQL中,语句的语义对于SQL的优化,存在什么样的影响?
或者,换句话说,SQL优化的一些技术,是否存在前后次序?
如果存在,遵照的是什么次序?(这个是个大话题,以后展开...)
2 示例:
CREATE TABLE t1 (id1 INT, a1 INT);
CREATE TABLE t2 (id2 INT, a2 INT);
INSERT INTO t1 VALUES (1,1), (2,2);
INSERT INTO t2 VALUES (3,3);
Q1:SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a1=1;
Q2:SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a2=1;
3 示例引发的问题:
1 Q1和Q2的执行结果有差别吗?---后面有答案,先思考再看答案更有收获。
2 有朋友看过《数据库查询优化器的艺术》一书,或通过学习掌握了:SQL优化技术中逻辑优化方式“有个常量传递”。
认为Q1和Q2的结果是相同的,原因在于,因为Q1可以因为“a1=a2”和“a1=1”推知“a1=a2=1”;而Q2可以因为“a1=a2”和“a2=1”推知“a1=a2=1”,所以Q1和Q2等价(另外一个等价的原因是 Q1和Q2都是 左外连接,语义相同;连接的表对象相同)。
4 示例的执行结果:
在MySQL 5.7.9中执行:
Q1和Q2的执行结果不同,和预想的不一样?难道是MySQL有bug?mysql> SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a1=1;
+------+------+------+------+
| id1 | a1 | id2 | a2 |
+------+------+------+------+
| 1 | 1 | NULL | NULL |
+------+------+------+------+
1 row in set (0.00 sec)
mysql> SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a2=1;
Empty set (0.00 sec)
在PostgreSQL 9.4.1中执行:
Q1和Q2的执行结果不同。难道是PG也有bug?postgres=# SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a1=1;
1 | 1 | |
postgres=# SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a2=1;<NULL> //<NULL>表示输出是空行,非PG的显示样式
5 通过查看执行计划进行分析
在MySQL 5.7.9中执行:
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a1=1;Q2的执行计划显示,原先语义中的“LEFT JOIN”变为了“JOIN”,而Q1没有变“LEFT JOIN”依旧是“LEFT JOIN”。
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------+
| Level | Code | Message
|
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------+
| Note | 1003 | /* select#1 */ select `d`.`t1`.`id1` AS `id1`,`d`.`t1`.`a1` AS `a1`,`d`.`t2`.`id2` AS `id2`,`d`.`t2`.`a2` AS `a2`
from `d`.`t1` left join `d`.`t2` on((`d`.`t2`.`a2
` = 1)) where (`d`.`t1`.`a1` = 1) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a2=1;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------+
| Level | Code | Message
|
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------+
| Note | 1003 | /* select#1 */ select `d`.`t1`.`id1` AS `id1`,`d`.`t1`.`a1` AS `a1`,`d`.`t2`.`id2` AS `id2`,`d`.`t2`.`a2` AS `a2`
from `d`.`t1` join `d`.`t2` where ((`d`.`t2`.`a2`
= 1) and (`d`.`t1`.`a1` = 1)) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------+
1 row in set (0.00 sec)
对于PG,执行计划如下:
Q2的执行计划显示,原先语义中的“LEFT JOIN”变为了“Nested Loop JOIN”,而Q1没有变“LEFT JOIN”依旧是“LEFT JOIN”。这类似MySQL。说明无论是PG还是MySQL,此种情况,他们的处理方式是一样的。postgres=# EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a1=1;
Nested Loop Left Join (cost=0.00..-1.#J rows=-1 width=16)
Join Filter: (t1.a1 = t2.a2)
-> Seq Scan on t1 (cost=0.00..22.93 rows=-1 width=8)
Filter: (a1 = 1)
-> Seq Scan on t2 (cost=0.00..22.93 rows=-1 width=8)
Filter: (a2 = 1)
postgres=# EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON a1=a2 WHERE a2=1;
Nested Loop (cost=0.00..-1.#J rows=-1 width=16)
-> Seq Scan on t1 (cost=0.00..22.93 rows=-1 width=8)
Filter: (a1 = 1)
-> Seq Scan on t2 (cost=0.00..22.93 rows=-1 width=8)
Filter: (a2 = 1)
现在,我们可以得出一个结论:Q1和Q2不等价。那么,为什么Q1和Q2不等价呢?或者说,为什么不能使用常量传递做推导以致我们可以认为他们等价呢?
6 揭开面纱
6.1 SQL语义存在语义。那么,什么是语义?直观地看,外连接就是一种语义,有特定的含义(内表存在不满足外表连接条件的可能,这种可能致使连接后的结果保留外表的元组)。
6.2 SQL的形式,也有语义存在,有前后执行的顺序的语义。这句话说得是什么含义?举例来说,对于Q1,先应执行外连接,然后在外连接的结果集上,执行WHERE子句的条件过滤,这就是子句间执行顺序的语义。所以,对于Q1来讲,外连接的结果集是如下内容:
然后,在上面的结果集上,再执行"WHERE a1=1",所以结果集中只有一行元组。+------+------+------+------+
| id1 | a1 | id2 | a2 |
+------+------+------+------+
| 1 | 1 | NULL | NULL |
| 2 | 2 | NULL | NULL |
+------+------+------+------+
而Q2,其实也是如此,在上面的结果集上,执行“WHERE a2=1”,结果没有满足条件的元组。
所以,至此,疑惑解除,答案已经显而易见了。
7 进阶讨论
有朋友已经掌握了外连接的消除技术--“空值拒绝”,看似Q1和Q2都满足空值拒绝。是这样的吗?
7.1 Q1的"WHERE a1=1"条件,是在外表t1上的条件,不满足空值拒绝。
7.2 Q2的"WHERE a2=1"条件,是在内表t2上的确保t2不提供NULL元组的条件,满足空值拒绝。所以外连接被优化掉变为了内连接。所以执行计划上可以看到,外连接被消除了。
8 重要结论
对于内核的开发人员而言,尤其要注意不同的优化技术,要注意其施加的阶段,是由语义决定有前后顺序的,不是可以任意使用的。而发生“常量传递”,需要限于“WHERE”子句范围内,不能超越到外面(如与ON子句合并在推导常量传递是不可以的)。