When the subselect runs faster (zt)

本文探讨了一种改进SQL查询性能的方法,通过将原始查询重构为使用子查询与索引,显著提升了查询速度。具体案例展示了如何通过调整查询逻辑和使用FORCEINDEX提示来优化查询性能,特别关注了WHERE子句不具高选择性的情况。这种方法不仅加速了查询过程,还揭示了在特定场景下,子查询并非总是降低性能的原因。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

A few weeks ago, we had a query optimization request from one of our customer.

The query was very simple like:

CODE:
  1. SELECT * FROM `table` WHERE  (col1= 'A'||col1= 'B' ) ORDER BY id DESC LIMIT  20 OFFSET  0

 

This column in the table is looks like this:

CODE:
  1. `col1` enum ( 'A', 'B', 'C', 'CD', 'DE', 'F', 'G', 'HI' ) default NULL

 

The table have 549252 rows and of course, there is an index on the col1. MySQL estimated the cardinality of that index as 87, though what was of course misleading as index cardinality in this case can't be over 9, as there is only 8(+ NULL) different possible values for this column.

CODE:
  1. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
  2. | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                       |
  3. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
  4. |   1 | SIMPLE      | table  | range | col1         | col1 |  2       | NULL |  549252 | Using where; Using filesort |
  5. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+

 

This query took more than 5 minutes (the rows are large and table does not fit in cache well)

When you want to run this query mysql first will try to find each row where col1 is A or B using index. Then its going to order by the ID using file sort and then send first 20 rows ignoring the rest.

In this case MySQL has 2 indexes where one is usable to find rows, while other is usable to return them in the right order. MySQL can chose only one of them to execute the query - use index to find rows. This is sensible strategy if there is no LIMIT, however it is poor chose if there is one - it is often a lot faster to retrieve rows in order checking WHERE clause for them until required number of rows were returned. Especially in the cases when WHERE clause is not very selective.

So I tried this:

CODE:
  1. select * from table where id in  (SELECT id FROM `table` WHERE  (col1= 'A'||col1= 'B' ) ) ORDER BY id DESC LIMIT  20 OFFSET  0;

 

In this case we forcing MySQL to do retrieve rows in sorted order and checking if it matches our original WHERE clause with subselects. It looks scary if we look at EXPLAIN but in reality the dependent subquery is only executed enough times to produce 20 rows in result set.

CODE:
  1. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
  2. | id | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows   | Extra       |
  3. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
  4. |   1 | PRIMARY            | table  | index           | NULL          | PRIMARY |  4       | NULL |  765105 | Using where |
  5. |   2 | DEPENDENT SUBQUERY | table  | unique_subquery | PRIMARY,col1  | PRIMARY |  4       | func |       1 | Using where |
  6. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+

 

The result is a lot better result time:

CODE:
  1. ( 20 rows in set  ( 0. 01 sec ) )

 

So by rewriting query using subqueries we actually improved it performance 100 times. So subqueries are
not always slowing things down.

Even though proving subqueries are not always slow this way is not the most optimal. We do not really need separate subselect to make MySQL check WHERE clause while scanning table in index order. We can just use FORCE INDEX hint to override MySQL index choice:

CODE:
  1. mysql> explain select * from table FORCE INDEX (PRIMARY ) where  (col1= 'A'||col1= 'B' ) order by id desc limit  20 offset  0;
  2. +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
  3. | id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
  4. +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
  5. |   1 | SIMPLE      | table  | index | NULL          | PRIMARY |  4       | NULL |  549117 | Using where |
  6. +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
  7.  
  8. mysql> select * from table FORCE INDEX (PRIMARY ) where  (col1= 'A'||col1= 'B' ) order by id desc limit  20 offset  0;
  9. ...
  10. 20 rows in set  ( 0. 00 sec )

 

This approach works well if WHERE clause is not very selective, otherwise MySQL may need to scan very many rows to find enough matching rows. You can use another trick Peter
wrote. about couple of years ago.

 

内容概要:该论文探讨了一种基于粒子群优化(PSO)的STAR-RIS辅助NOMA无线通信网络优化方法。STAR-RIS作为一种新型可重构智能表面,能同时反射和传输信号,与传统仅能反射的RIS不同。结合NOMA技术,STAR-RIS可以提升覆盖范围、用户容量和频谱效率。针对STAR-RIS元素众多导致获取完整信道状态信息(CSI)开销大的问题,作者提出一种在不依赖完整CSI的情况下,联合优化功率分配、基站波束成形以及STAR-RIS的传输和反射波束成形向量的方法,以最大化总可实现速率并确保每个用户的最低速率要求。仿真结果显示,该方案优于STAR-RIS辅助的OMA系统。 适合人群:具备一定无线通信理论基础、对智能反射面技术和非正交多址接入技术感兴趣的科研人员和工程师。 使用场景及目标:①适用于希望深入了解STAR-RIS与NOMA结合的研究者;②为解决无线通信中频谱资源紧张、提高系统性能提供新的思路和技术手段;③帮助理解PSO算法在无线通信优化问题中的应用。 其他说明:文中提供了详细的Python代码实现,涵盖系统参数设置、信道建模、速率计算、目标函数定义、约束条件设定、主优化函数设计及结果可视化等环节,便于读者理解和复现实验结果。此外,文章还对比了PSO与其他优化算法(如DDPG)的区别,强调了PSO在不需要显式CSI估计方面的优势。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值