1.1.1 PostgreSQL
1.1.2.1 S1语句
查看查询执行计划,子查询被优化(采用半连接)。
postgres=# EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT b1 FROM t1 WHERE a3=a1);
QUERY PLAN
------------------------------------------------------------------
Hash Semi Join (cost=55.90..114.35 rows=2040 width=12)
Hash Cond: (t3.a3 = t1.a1)
-> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=12)
-> Hash (cost=30.40..30.40 rows=2040 width=4)
-> Seq Scan on t1 (cost=0.00..30.40 rows=2040 width=4)
(5 行记录)
1.1.2.2 S2语句
查看查询执行计划,子查询被优化(采用半连接)。
postgres=# EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT a1 FROM t1 WHERE a3=a1);
QUERY PLAN
------------------------------------------------------------------
Hash Semi Join (cost=55.90..114.35 rows=2040 width=12)
Hash Cond: (t3.a3 = t1.a1)
-> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=12)
-> Hash (cost=30.40..30.40 rows=2040 width=4)
-> Seq Scan on t1 (cost=0.00..30.40 rows=2040 width=4)
(5 行记录)
1.1.2.3 S3语句
查看查询执行计划,子查询被优化(采用半连接)。
postgres=# EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT id1 FROM t1 WHERE a3=a1)
QUERY PLAN
------------------------------------------------------------------
Hash Semi Join (cost=55.90..114.35 rows=2040 width=12)
Hash Cond: (t3.a3 = t1.a1)
-> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=12)
-> Hash (cost=30.40..30.40 rows=2040 width=4)
-> Seq Scan on t1 (cost=0.00..30.40 rows=2040 width=4)
(5 行记录)
1.1.2.4 S4语句
查看查询执行计划,子查询被优化。
postgres=# EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT b1 FROM t1 WHERE a1>2);
QUERY PLAN
-------------------------------------------------------------
Result (cost=0.05..30.45 rows=2040 width=12)
On
InitPlan 1 (returns $0) //子查询被优化,只被执行一次,类似其他数据库如MySQL对子查询的物化优化
-> Seq Scan on t1 (cost=0.00..35.50 rows=680 width=0)
Filter: (a1 > 2)
-> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=12)
(6 行记录)
1.1.2.5 S5语句
查看查询执行计划,子查询被优化。
postgres=# EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT a1 FROM t1 WHERE b3=b1);
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=40.00..98.45 rows=1020 width=12)
Hash Cond: (t3.b3 = t1.b1)
-> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=12)
-> Hash (cost=37.50..37.50 rows=200 width=4)
-> HashAggregate (cost=35.50..37.50 rows=200 width=4) //在t1表上执行了一个hash聚集操作
Group Key: t1.b1
-> Seq Scan on t1 (cost=0.00..30.40 rows=2040 width=4)
(7 行记录)
1.1.2 MySQL
1.1.3.1 S1语句
查看查询执行计划,子查询没有被优化。
mysql> EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT b1 FROM t1 WHERE a3=a1);
+----+--------------------+-------+------+------+-------------+
| id | select_type | table | type | key | Extra |
+----+--------------------+-------+------+------+-------------+
| 1 | PRIMARY | t3 | ALL | NULL | Using where |
| 2 | DEPENDENT SUBQUERY | t1 | ref | a1 | NULL |
+----+--------------------+-------+------+------+-------------+
2 rows in set (0.02 sec)
1.1.3.2 S2语句
查看查询执行计划,子查询没有被优化。
mysql> EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT a1 FROM t1 WHERE a3=a1);
+----+--------------------+-------+------+------+-------------+
| id | select_type | table | type | key | Extra |
+----+--------------------+-------+------+------+-------------+
| 1 | PRIMARY | t3 | ALL | NULL | Using where |
| 2 | DEPENDENT SUBQUERY | t1 | ref | a1 | Using index |
+----+--------------------+-------+------+------+-------------+
2 rows in set (0.00 sec)
1.1.3.3 S3语句
查看查询执行计划,子查询没有被优化。
mysql> EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT id1 FROM t1 WHERE a3=a1);
+----+--------------------+-------+------+------+-------------+
| id | select_type | table | type | key | Extra |
+----+--------------------+-------+------+------+-------------+
| 1 | PRIMARY | t3 | ALL | NULL | Using where |
| 2 | DEPENDENT SUBQUERY | t1 | ref | a1 | Using index |
+----+--------------------+-------+------+------+-------------+
2 rows in set (0.00 sec)
1.1.3.4 S4语句
查看查询执行计划,子查询没有被优化。
mysql> EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT b1 FROM t1 WHERE a1>2);
+----+-------------+-------+-------+------+-----------------------+
| id | select_type | table | type | key | Extra |
+----+-------------+-------+-------+------+-----------------------+
| 1 | PRIMARY | t3 | ALL | NULL | NULL |
| 2 | SUBQUERY | t1 | range | a1 | Using index condition |
+----+-------------+-------+-------+------+-----------------------+
2 rows in set (0.01 sec)
1.1.3.5 S5语句
查看查询执行计划,子查询没有被优化。
mysql> EXPLAIN SELECT * FROM t3 WHERE EXISTS (SELECT a1 FROM t1 WHERE b3=b1);
+----+--------------------+-------+------+------+-------------+
| id | select_type | table | type | key | Extra |
+----+--------------------+-------+------+------+-------------+
| 1 | PRIMARY | t3 | ALL | NULL | Using where |
| 2 | DEPENDENT SUBQUERY | t1 | ALL | NULL | Using where |
+----+--------------------+-------+------+------+-------------+
2 rows in set (0.00 sec)
1.1.4 对比
我们把原始的SQL语句复制一份,目的是查阅方便。
S1: SELECT * FROM t3 WHERE b3 EXISTS (SELECT b1 FROM t1 WHERE a3=a1);
S2: SELECT * FROM t3 WHERE b3 EXISTS (SELECT a1 FROM t1 WHERE a3=a1);
S3: SELECT * FROM t3 WHERE b3 EXISTS (SELECT id1 FROM t1 WHERE a3=a1);
S4: SELECT * FROM t3 WHERE id3 EXISTS (SELECT b1 FROM t1 WHERE a1>2);
S5: SELECT * FROM t3 WHERE EXISTS (SELECT a1 FROM t1 WHERE b3=b1);
然后对比如下:
SQL语句 |
语句特点 |
ToprowDB |
PostgreSQL |
MySQL |
S1 |
相关子查询,目标列为普通列,条件为索引键 |
Join优化 |
Semi Join优化 |
没有被优化 |
S2 |
相关子查询,目标列为唯一键,条件为索引键 |
Join优化 |
Semi Join优化 |
没有被优化 |
S3 |
相关子查询,目标列为主键列,条件为索引键 |
Join优化 |
Semi Join优化 |
没有被优化 |
S4 |
非相关子查询,目标列为普通列,条件为索引键 |
没有被优化 |
类似物化优化 |
没有被优化 |
S5 |
相关子查询,目标列为唯一键,条件为普通列 |
Semi Join优化 |
Join优化 |
没有被优化 |
分析:
q 从整体上看,对于EXISTS类型的子查询,PostgreSQL的优化能力最强,MySQL最差,ToprowDB接近PostgreSQL
q 对于EXISTS类型的子查询,PostgreSQL基本上是利用Semi Join来进行优化的;多数数据库的优化器都会有类似的方式来对EXISTS类型的子查询进行优化
q 但对于ToprowDB却采取的是普通Join,这是为什么?其实,ToprowDB的优化器也是先进行了Semi Join式优化,然后根据唯一性(唯一索引如S1中的a3=a1)进一步判断,在明知不会产生重复元组的情况下,把Semi Join进一步优化为Join。这点体现了ToprowDB优化器的先进性
q 对比S1和S5(关键在于WHERE条件选元组),PostgreSQL把S5优化为Join操作,这在PostgreSQL中属于特例,PostgreSQL在经过逻辑优化(转为Semi Join属于逻辑优化)之后,利用了代价估算模型来进一步判断是使用排序还是Hash聚集操作来优化EXISTS语义,经过选择,认定Hash聚集代价小于排序,因此选定了HashAggregate 作为其查询执行计划