Copyright © 2023 PawSQL
本篇属于高级SQL优化系列专题中的一篇,该专题介绍PawSQL引擎优化算法原理及优化案例,欢迎大家订阅。
问题定义
如果使用OR连接两个查询条件,数据库优化器有可能无法使用索引来完成查询。譬如对于以下的SQL语句,
select * from customer where c_phone like '139%' or c_name = 'Ray'
如果这两个字段上都有索引,可以把他们重写为UNION查询,以便使用索引提升查询性能。
select * from customer where c_phone like '139%'
union
select * from customer where c_name = 'Ray'
但是这种转换并不总是能够提升查询性能,它需要一定的适用条件,并需要经过基于代价的估算。
如果数据库支持
index merging(请参考如和创建高效的索引),也可以调整数据库相关参数启用index merging优化策略来提升数据库性能。
适用条件
OR连接的条件必须是可以利用索引的- 重写后的
UNION语句估算代价比原SQL小 - 如果
OR分支的条件是互斥的,那么重写为UNION ALL
案例分析
案例1. 条件分支无法利用索引,不进行重写
select * from customer
where c_phone = '1' or c_phone like '%139%'
解析: 其中一个条件c_phone like '%139%'无法利用索引,重写后仍然需要全表扫描,PawSQL不进行重写
案例2. 过滤条件选择率足够低,不进行重写
select * from customer
where custkey = 1
and (c_phone = '1' or c_phone like '%139%')
解析:custkey是主键,custkey = 1唯一定位一条记录,所以无需进行重写
案例3. 满足重写条件,进行重写
select distinct * from customer
where c_phone like '139%' or c_name = 'Ray'
解析: 两个条件都可以利用索引,且选择率低于10%,可以进行重写,由于union可以去重,所以原SQL中的distinct在重写后可以去除。
select * from customer where c_phone = '1'
union
select * from customer where c_phone like '139%'
案例4. OR条件分支互斥,重写为UNION ALL
select * from customer where custkey = 1 or (custkey = 2 and c_phone like '139%')
解析:由于两个条件分支c_custkey = 2 and c_phone like '139%' 和 c_custkey = 1 互斥,因此重写为UNION ALL
select * from customer where c_custkey = 2 and c_phone like '139%'
union all
select * from customer where c_custkey = 1
案例5. 包含ORDER BY子句和LIMIT子句,满足重写条件
select * from orders o
where O_ORDERDATE>='2021-01-01'
and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
order by O_ORDERDATE desc limit 10
**解析:**虽然O_ORDERPRIORITY = 1 和 O_SHIPPRIORITY = 1的选择率较高,但是由于通过索引可以避免排序,总体代价较低,PawSQL进行了重写优化.
select *
from (
(select /*QB_2*/ *
from orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_SHIPPRIORITY = 1
order by o.O_ORDERDATE desc limit 10)
union
(select /*QB_1*/ *
from orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_ORDERPRIORITY = '1'
order by o.O_ORDERDATE desc limit 10
)
) as PawDT_1702555889039
order by PawDT_1702555889039.O_ORDERDATE desc limit 10
案例6. 包含分组和聚集,满足重写条件
select O_ORDERDATE, count(1) from orders o
where O_ORDERDATE>='2021-01-01' and (O_ORDERPRIORITY = 1 or O_SHIPPRIORITY = 1)
group by O_ORDERDATE
**解析:**虽然O_ORDERPRIORITY = 1 和O_SHIPPRIORITY = 1的选择率较高,但是由于通过索引可以避免排序,总体代价较低,PawSQL进行了重写优化.
select PawDT_1702884016144.O_ORDERDATE, count(1)
from (
select /*QB_2*/ o.O_ORDERDATE, o.O_ORDERKEY
from tpch_pkfk.orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_SHIPPRIORITY = 1
union
select /*QB_1*/ o.O_ORDERDATE, o.O_ORDERKEY
from tpch_pkfk.orders as o
where o.O_ORDERDATE >= '2021-01-01'
and o.O_ORDERPRIORITY = '1'
) as PawDT_1702884016144
group by PawDT_1702884016144.O_ORDERDATE
性能验证
案例5性能提升900倍
- 优化前执行计划(执行时间432.322ms)

- 优化后执行计划(执行时间0.189ms)

案例6性能提升20倍
-
优化前执行计划(2.816ms)

-
优化后执行计划(0.131ms)

🌐关于PawSQL
PawSQL专注于数据库性能优化自动化和智能化,提供的解决方案覆盖SQL开发、测试、运维的整个流程,广泛支持包括TDSQL在内的多种主流商用和开源数据库,为开发者和企业提供一站式的创新SQL优化解决方案。提升数据库系统的稳定性、应用性能和基础设施利用率,为企业节省了大量的运维成本和时间投入。


1396

被折叠的 条评论
为什么被折叠?



