SQL调优公式T=S/V (zt)

本文介绍了一种SQL调优的方法,通过减少SQL所需访问的资源总量(S)和增加单位时间内能访问的资源量(V),来缩短SQL执行时间(T)。文章详细展示了如何通过建立索引、并行处理和预先统计不活跃数据等方式降低资源消耗。

http://www.itpub.net/viewthread.php?tid=931723&extra=&page=2

S指sql所需访问的资源总量,V指sql单位时间所能访问的资源量,T自然就是SQL执行所需时间了

这个公式很简单,调优的目标是减少T,T=S/V,那么我们要调的对象自然就要放到这里的S和V上,调优的方法就是围绕减少S和增大V

增大V(单位时间所能访问的资源量),在硬件设备不变的情况下所能做的只有充分利用现有资源,如:通过调整SGA充分利用物理MEMORY,通过并行处理充分利用CPU,通过调整IO分布充分利用硬盘处理能力,还有。。。。升级设备?

[@more@]

减少S(sql所需访问的资源总量),这通常是调优工作的重中之重,SQL调优的主要目的就是围绕着如何减少S在进行,在ORACLE中,所需访问的资源以block记,一条SQL执行所读写的block数直接影响到SQL的执行时间,如何知道SQL执行所读写的block数呢,简单的方法如下:
set autotrace traceonly
SQL> select count(*) from test;

已用时间: 00: 00: 10.01

执行计划
----------------------------------------------------------
Plan hash value: 1950795681

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19342 (1)| 00:03:53 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| TEST | 5285K| 19342 (1)| 00:03:53 |
-------------------------------------------------------------------


统计信息
----------------------------------------------------------
204 recursive calls
0 db block gets
70655 consistent gets
70616 physical reads
0 redo size
422 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed

通过上面的信息我们可以看到此SQL访问的block数为70655,并且基本上是物理读,其执行时间为00: 00: 10.01

大家可能一眼就看出来了,这条SQL执行了全表扫描,加索引优化就可以了,没错,索引访问正是减少SQL所需访问资源的一个主要途径
其效果也很明显
已用时间: 00: 00: 01.89

执行计划
----------------------------------------------------------
Plan hash value: 826211483

-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3089 (2)| 00:00:38 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| I_TEST1 | 5285K| 3089 (2)| 00:00:38 |
-------------------------------------------------------------------------


统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
11218 consistent gets
11197 physical reads
0 redo size
422 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

其访问的block数变为11218,其执行时间随之变为00: 00: 01.89

通过减少S,我们可以看到T得到明显的减小

上面情况是在V不变的情况下(都是物理读)的差别

再看看V最大化的结果

SQL> select count(*) from test;

执行计划
----------------------------------------------------------
Plan hash value: 826211483

-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3089 (2)| 00:00:38 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| I_TEST1 | 5285K| 3089 (2)| 00:00:38 |
-------------------------------------------------------------------------

统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
11218 consistent gets
0 physical reads
0 redo size
422 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

已用时间: 00: 00: 00.79

SQL> select /*+parallel(test 2)*/ count(*) from test;



执行计划
----------------------------------------------------------
Plan hash value: 826211483

-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3089 (2)| 00:00:38 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| I_TEST1 | 5285K| 3089 (2)| 00:00:38 |
-------------------------------------------------------------------------


统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
11218 consistent gets
0 physical reads
0 redo size
422 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

已用时间: 00: 00: 00.68

T被控制到了秒以下。

可是对于这样的一个业务11218个block的资源需求量是否也不是很合理呢,如果这个业务发生的很频繁,每次执行都需要重复扫描到相同资源并且这部分资源占总资源的绝大部分,这肯定也是很不合理的。
既然是减少S,那么减少这些重复访问的绝大部分资源理应得到更好的效果。
以上面的业务为例:
本业务的需求是实时统计表内的行数,数据都是具有生命周期的,通常情况下一个业务表内的数据大多是处于不活动状态,如此以来,预先统计好这部分数据并形成结果,每次需要统计这部分数据时直接调用结果必将大幅减少业务所需访问的资源
如本例,已知object_id小于等于13000的数据处于不活动状态
构建分析结果表
create table test_analyzed as select count(*) o_count,13000 as o_data from test where object_id<=13000;

SQL> select o_count from test_analyzed;

O_COUNT
----------
5242624

已用时间: 00: 00: 00.00

统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
421 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select count(*) from test where object_id>13000;

COUNT(*)
----------
42624

已用时间: 00: 00: 00.01


执行计划
----------------------------------------------------------
Plan hash value: 3544821501

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 159 (1)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
|* 2 | INDEX RANGE SCAN| I_TEST1 | 73774 | 360K| 159 (1)| 00:00:02 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID">13000)


统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
98 consistent gets
0 physical reads
0 redo size
421 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

S变成了3+98=101,T自然也就变成不到10毫秒了

当然现实工作中,业务需求不可能就可以如此简单的加以分解,这里只是提供一个思路

性能调优是产品全生命周期的工作,一条sql的调优很可能会追溯到产品需求的定义及产品设计等环节
而在调优工作中V的提升通常是不可持续的,一定程度的提升后再想有所提高是需要付出财力的,S的提升却是很有潜力可挖的。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/35489/viewspace-998319/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/35489/viewspace-998319/

```sql -- 修正后的完整SQL查询 WITH base_data AS ( SELECT date, instrument, name, close, total_market_cap, float_market_cap, -- 明确命名流通市值 open_gap, zt_count_60d, ret_5d, open_proximity, -- 计算综合评分(修正ret_d错误) 0.5 * PERCENT_RANK() OVER (ORDER BY open_proximity) + 0.3 * PERCENT_RANK() OVER (ORDER BY zt_count_60d) + 0.2 * PERCENT_RANK() OVER (ORDER BY ret_5d) AS strategy_score FROM stock_data WHERE date = '2025-10-30' AND is_risk_warning = 0 AND suspended = 0 AND list_days >= 60 ), filtered_data AS ( SELECT *, -- 明确字段命名 float_market_cap AS circulation_market_cap, strategy_score AS rank_score FROM base_data WHERE open_gap BETWEEN 0.02 AND 0.07 -- 开盘跳空2%-7% AND zt_count_60d >= 1 AND ret_5d > 0 ) SELECT date, instrument, name, close, total_market_cap, circulation_market_cap, open_gap, zt_count_60d, ret_5d, rank_score FROM filtered_data ORDER BY rank_score DESC LIMIT 5; ``` 📊 策略化增强版 ```python import pandas as pd import numpy as np def enhanced_strategy_analysis(df): """ 增强版策略分析函数 """ # 1. 数据质量检查 print("=== 数据质量检查 ===") print(f"数据总数: {len(df)}") print(f"字段信息:") for col in df.columns: print(f" {col}: {df[col].dtype}, 缺失值: {df[col].isnull().sum()}") # 2. 策略绩效分析 if len(df) > 0: print("\n=== 策略选股分析 ===") print(f"选股数量: {len(df)}") print(f"综合评分范围: {df['rank_score'].min():.3f} - {df['rank_score'].max():.3f}") print(f"平均开盘跳空: {df['open_gap'].mean():.2%}") print(f"平均60日涨停次数: {df['zt_count_60d'].mean():.1f}") print(f"平均5日收益率: {df['ret_5d'].mean():.2%}") # 3. 风险收益特征 print("\n=== 风险收益特征 ===") print(f"总市值分布: {df['total_market_cap'].min():.0f} - {df['total_market_cap'].max():.0f} 亿") print(f"流通市值分布: {df['circulation_market_cap'].min():.0f} - {df['circulation_market_cap'].max():.0f} 亿") # 4. 选股明细 print("\n=== 策略选股明细 ===") result_df = df[[ 'instrument', 'name', 'close', 'open_gap', 'zt_count_60d', 'ret_5d', 'rank_score' ]].copy() result_df['open_gap'] = result_df['open_gap'].apply(lambda x: f"{x:.2%}") result_df['ret_5d'] = result_df['ret_5d'].apply(lambda x: f"{x:.2%}") result_df['rank_score'] = result_df['rank_score'].round(3) return result_df else: print("⚠️ 未筛选到符合条件的股票") return pd.DataFrame() # 执行查询和分析 try: # 执行修正后的查询 sql_query = """ -- 上面修正后的SQL代码 """ df = dai.query(sql_query) print("✅ 查询执行成功") # 进行策略分析 result_df = enhanced_strategy_analysis(df) if not result_df.empty: print("\n🎯 最终选股结果:") print(result_df.to_string(index=False)) except Exception as e: print(f"❌ 查询执行失败: {e}") ``` 💡 进一步化建议 ```python def strategy_optimization_suggestions(df): """ 基于结果数据的策略化建议 """ if len(df) == 0: return suggestions = [] # 1. 流动性筛选建议 avg_circulation = df['circulation_market_cap'].mean() if avg_circulation < 50: # 50亿以下 suggestions.append("🔍 增加流动性筛选:当前选股偏向小盘股,建议加入成交量过滤条件") # 2. 风险集中度分析 industry_concentration = len(df) / len(set([name[:2] for name in df['name']])) if len(df) > 0 else 0 if industry_concentration > 3: suggestions.append("📊 行业分散度不足:建议增加行业分散度约束") # 3. 动量持续性检查 high_momentum_stocks = len(df[df['ret_5d'] > 0.1]) # 5日涨幅超过10% if high_momentum_stocks / len(df) > 0.6: suggestions.append("⚡ 动量过热:当前选股动量较强,注意回风险") # 4. 涨停基因质量 avg_zt_count = df['zt_count_60d'].mean() if avg_zt_count < 2: suggestions.append("🎯 提高涨停基因标准:建议将60日涨停次数要求提高到2次以上") if suggestions: print("\n💡 策略化建议:") for suggestion in suggestions: print(f" {suggestion}") # 执行化建议分析 strategy_optimization_suggestions(df) ``` 🚀 高级功能扩展 ```python def advanced_strategy_features(): """ 高级策略功能扩展 """ advanced_sql = """ -- 1. 加入流动性筛选 AND volume > (SELECT AVG(volume_20d) FROM stock_liquidity WHERE instrument = base_data.instrument) -- 2. 加入波动率控制 AND volatility_20d < 0.5 -- 20日波动率小于50% -- 3. 行业分散度约束 AND industry NOT IN ( SELECT industry FROM ( SELECT industry, COUNT(*) as count FROM filtered_data GROUP BY industry ORDER BY count DESC LIMIT 1 ) t -- 避免过度集中在单一行业 ) """ return advanced_sql ``` df 数据样本 1. 🔬 进行详细的策略回测分析 2. 📈 评估选股的质量和风险特征 3. 🎯 给出具体的参数建议 4. ⚡ 提供实盘监控代码模板
11-01
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值