我们经常听到避免使用MySQL的Join连接查询,原因是性能低,实际情况是这样的吗?我们来实验一下。
1,环境
MySQL:5.7.30
存储引擎:InnoDB
索引类型:默认的经过优化后的B+树
2,准备
2.1 表准备 和 数据准备
我们打算创建5个表,并往里面插入10W条数据,如下:
2.1.1 01_test
CREATE TABLE IF NOT EXISTS `01_test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键ID',
`phone_number` bigint(11) NOT NULL DEFAULT 0 COMMENT '手机号码',
PRIMARY KEY (`id`),
UNIQUE KEY `uidx_pn` (`phone_number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='测试表1';
通过存储过程插入数据
DELIMITER $$
CREATE PROCEDURE test_01_insert()
BEGIN
declare i int;
set i = 0 ;
WHILE (i < 100000) DO
INSERT INTO `01_test` (`phone_number`) VALUES (CONCAT('130', LPAD(i, 8, '0')));
set i = i + 1;
END WHILE;
commit;
END$$
DELIMITER ;
CALL test_01_insert();
Query OK, 0 rows affected (17.24 sec)
2.1.2 02_test
CREATE TABLE IF NOT EXISTS `02_test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键ID',
`first_test_id` int(11) unsigned NOT NULL DEFAULT 0 COMMENT '测试表1主键ID',
PRIMARY KEY (`id`),
KEY `idx_fti` (`first_test_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='测试表2';
03_test 到 05_test表,表结构都和02_test 差不多,只是字段名改成前面一个表的,索引名称也做相应的变更,这里就不赘述了。
插入数据
DELIMITER $$
CREATE PROCEDURE test_insert()
BEGIN
declare i int;
set i = 1;
WHILE (i < 100001) DO
INSERT INTO `02_test` (`first_test_id`) VALUES (i);
set i = i + 1;
END WHILE;
commit;
END$$
DELIMITER ;
CALL test_insert;
DROP PROCEDURE test_insert;
Query OK, 0 rows affected (17.17 sec)
基本上17秒左右插入10W条数据,还比较快。
3,实验
我们直接执行以下语句
SELECT T1.* from 01_test AS T1
INNER JOIN 02_test AS T2
INNER JOIN 03_test AS T3
INNER JOIN 04_test AS T4
INNER JOIN 05_test AS T5
ON T1.id = T2.first_test_id
AND T2.id = T3.second_test_id
AND T3.id = T4.third_test_id
AND T4.id = T5.forth_test_id
ORDER BY T1.id DESC LIMIT 10;
这里有5个表连接查询,结果:
+--------+--------------+
| id | phone_number |
+--------+--------------+
| 100000 | 13000099999 |
| 99999 | 13000099998 |
| 99998 | 13000099997 |
| 99997 | 13000099996 |
| 99996 | 13000099995 |
| 99995 | 13000099994 |
| 99994 | 13000099993 |
| 99993 | 13000099992 |
| 99992 | 13000099991 |
| 99991 | 13000099990 |
+--------+--------------+
10 rows in set (0.00 sec)
秒出结果。
查看执行计划
EXPLAIN
-> SELECT T1.* from 01_test AS T1
-> INNER JOIN 02_test AS T2
-> INNER JOIN 03_test AS T3
-> INNER JOIN 04_test AS T4
-> INNER JOIN 05_test AS T5
-> ON T1.id = T2.first_test_id
-> AND T2.id = T3.second_test_id
-> AND T3.id = T4.third_test_id
-> AND T4.id = T5.forth_test_id
-> ORDER BY T1.id DESC LIMIT 10;
+----+-------------+-------+------------+-------+------------------+----------+---------+------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+------------------+----------+---------+------------+------+----------+-------------+
| 1 | SIMPLE | T1 | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | T2 | NULL | ref | PRIMARY,uidx_fti | uidx_fti | 4 | test.T1.id | 1 | 100.00 | Using index |
| 1 | SIMPLE | T3 | NULL | ref | PRIMARY,uidx_sti | uidx_sti | 4 | test.T2.id | 1 | 100.00 | Using index |
| 1 | SIMPLE | T4 | NULL | ref | PRIMARY,uidx_tti | uidx_tti | 4 | test.T3.id | 1 | 100.00 | Using index |
| 1 | SIMPLE | T5 | NULL | ref | uidx_fti | uidx_fti | 4 | test.T4.id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+-------+------------------+----------+---------+------------+------+----------+-------------+
5 rows in set, 1 warning (0.00 sec)
可以看到,每个连接查询都用到了所以,过滤率都达到了100。
4,总结
从上面的实验可以看出,如果查询语句合适,ON语句两边的字段类型完全一致,每个ON语句都用到表索引,JOIN查询速度还是挺快的。
那么,为什么我们经常看到要尽量避免JOIN查询?甚至在阿里公布的数据库操作规范中,明文提到:
4.1 原因
在实际场景中,上面所说的条件可能不满足,导致查询慢;
DB承担的业务压力大,能减少负担就减少;
分布式的分库分表,目前MySQL的分布式中间件,跨库JOIN表现不好,不建议跨库JOIN;
修改表的schema,单表查询的修改比较容易,JOIN写的sql语句要修改,不容易发现,成本比较大,当系统比较大时,不好维护。
4.2 替代方案
用应用程序逻辑处理替代MySQL JOIN查询,例如先查出表A数据,再用 WHERE B.column IN (array) 的方式处理;
在表中进行反范式设计,将字段冗余到同一张表中,在本表中就可以一次查出来;