问题说明:
数据库迁移到新服务器后,某一张视图查询缓慢,原库查询耗时3分钟,新库查询5小时以上。
环境说明:
新库:
DB:Oracle 11.2.0.4.0 RAC
OS:Redhat 7.6
原库:
DB:Oracle 11.2.0.4.0 单机
OS:AIX 5.1
问题分析:
处理此类问题,常见思路如下:
1 分别对比原库和新库该SQL对应的执行计划。
2 对比两个执行计划,找到差异部分或最耗时的部分。
3 分析执行计划差异原因,根据原因尝试解决问题。
本次案例视图引用的SQL使用了28个union all组成,关联了30多张表。
在执行如下SQL时,新库会卡住很久,甚至无法返回结果。
select count(*) from v_cjc_01;
视图内SQL类似如下:
create or replace view v_cjc_01 as
select col1,col2,col3,...col10
from t1
where xxx=yyy
union all
select col1,col2,col3,...col10
from t2
where zzz=yyy
union all
......
union all
select col1,col2,col3,...col10
from t3
where xxx=yyy and col2 not in (select col2 from t3 a where col1=aaa col3=bbb)
union all
select col1,col2,col3,...col10
from t5
where ttt=yyy;
查看执行计划,发现主要有两处不同
一:
原执行计划
|* 40 | HASH JOIN | | 72408 | 2616K| 2264K| 28026 (1)| 00:05:37 |
| 41 | VIEW | VW_JF_SET$5EF83EA7 | 72408 | 1414K| | 3028 (1)| 00:00:37 |
| 42 | UNION-ALL | | | | | | |
|* 43 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 72407 | 2050K| | 1514 (1)| 00:00:19 |
|* 44 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 1 | 29 | | 1514 (1)| 00:00:19 |
| 45 | INDEX FAST FULL SCAN | IDX_MB_ACCINF_ACCNO | 9179K| 148M| | 12212 (1)| 00:02:27 |
----------------------------------------------------------------------------------------------------------------------
新执行计划
| 13 | NESTED LOOPS | | 236 | 9676 | 2003 (1)| 00:00:25 |
|* 14 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 235 | 5640 | 1533 (1)| 00:00:19 |
|* 15 | INDEX RANGE SCAN | IDX_MB_ACCINF_ACCNO | 1 | 17 | 2 (0)| 00:00:01 |
| 16 | NESTED LOOPS | | 1 | 41 | 1535 (1)| 00:00:19 |
|* 17 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 1 | 24 | 1533 (1)| 00:00:19 |
|* 18 | INDEX RANGE SCAN | IDX_MB_ACCINF_ACCNO | 1 | 17 | 2 (0)| 00:00:01 |
二:
原执行计划
|* 16 | HASH JOIN RIGHT ANTI NA | | 6647 | 1174K| 12M| 26593 (1)| 00:05:20 |
|* 17 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 130K| 10M| | 11309 (1)| 00:02:16 |
|* 18 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 664K| 60M| | 11309 (1)| 00:02:16 |
新执行计划
|* 21 | FILTER | | | | | |
|* 22 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 4928 | 202K| 11126 (1)| 00:02:14 |
|* 23 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 1 | 27 | 11124 (1)| 00:02:14 |
根据第一处不同,可以看到:
原执行计划,出现了VW_JF_SET$5EF83EA7,说明原库使用到了Join Factorization特性,
在该特性下,union all连接各个分支查询SQL时,会将各个分支部分中公共部分单独提出作为一个单独的结果集,
然后在和原union all剩下的部分做关联,从而减少了公共部分表重复扫描的问题,性能会有提升。
为什么新库没有使用到Join Factorization特性呢?
对比两个库的_OPTIMIZER_JOIN_FACTORIZATION参数,都是使用的默认值true。
说明优化器认为新库不适合使用Join Factorization特性。
但是从执行计划看两种表数据量都比较大,结果集也很大,新库使用的NESTED LOOPS明显比原库HASH JOIN性能差。
可以通过10053事件对比查看原库和新库对同一SQL是如何进行改写,如何计算cost的?
原库和新库分别执行:
alter session set tracefile_identifier='10053';
ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';
select count(*) from v_cjc_01;
ALTER SESSION SET EVENTS '10053 trace name context off';
select value from v$diag_info where name='Default Trace File';
查看生成的trace文件:
搜索关键字 Final query after transformations
新库SQL改写部分:
UNION ALL
(SELECT '0' "KERNAL_INTER4"
FROM "CHEN"."CJC_T1_00000000000000000" "A",
CHENPMB."MB_ACCINF" "B"
WHERE "B"."AIF_ACCNO" = "A"."CJC_T3_000000"
AND "A"."MPB_BUSINESSTYPE" = '10'
AND "A"."MPB_TRANTIME" IS NOT NULL
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" > '20200501000000') UNION ALL
(SELECT '0' "KERNAL_INTER4"
FROM "CHEN"."CJC_T1_00000000000000000" "A",
CHENPMB."MB_ACCINF" "B"
WHERE "B"."AIF_ACCNO" = "A"."CJC_T3_000000"
AND "A"."MPB_BUSINESSTYPE" = '11'
AND "A"."MPB_TRANTIME" IS NOT NULL
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" > '20200501000000')
原库SQL改写部分:
UNION ALL
(SELECT '0' "KERNAL_INTER4"
FROM CHENPMB."EA_EACCT_INFO_S" "EA_EACCT_INFO_S"
WHERE "EA_EACCT_INFO_S"."EA_ACCOUNT_OPEN_RESULT" = '1') UNION ALL
(SELECT "VW_JF_SET$5EF83EA7"."ITEM_2" "KERNAL_INTER4"
FROM ((SELECT "A"."CJC_T3_000000" "ITEM_1", '0' "ITEM_2"
FROM "CHEN"."CJC_T1_00000000000000000" "A"
WHERE "A"."MPB_TRANTIME" > '20200501000000'
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" IS NOT NULL
AND "A"."MPB_BUSINESSTYPE" = '10') UNION ALL
(SELECT "A"."CJC_T3_000000" "ITEM_1", '0' "ITEM_2"
FROM "CHEN"."CJC_T1_00000000000000000" "A"
WHERE "A"."MPB_TRANTIME" > '20200501000000'
AND LENGTH("A"."MPB_TRANTIME") = 14
AND "A"."MPB_TRANTIME" IS NOT NULL
AND "A"."MPB_BUSINESSTYPE" = '11')) "VW_JF_SET$5EF83EA7",
CHENPMB."MB_ACCINF" "B"
WHERE "B"."AIF_ACCNO" = "VW_JF_SET$5EF83EA7"."ITEM_1")) "MB_YINGXIAO_COUNT1"
尝试按照原库10053转换的SQL改写新库SQL后,执行速度恢复正常,但是SQL改动量较大。
根据第二处不同,可以看到:
改写前原SQL部分:
select CUSTOMERID as CSTNO,
TRANTIME as TRANTIME,
'10' as TRANTYPE,
to_date(TRANTIME, 'yyyymmddhh24miss') as TRANTIME1,
createtime,
1 as cnt,
0 as amt,
'0' as kernal_inter1,
'0' as kernal_inter2,
'0' as kernal_inter3,
'0' as kernal_inter4
from CJC_T2_00000000000000000 c
where c.tranname = 'XX绑定'
and c.retcode = '0000'
and c.TRANTIME is not null
and length(c.TRANTIME) = 14
and c.TRANTIME > '20200501000000'
and c.medicalcardno not in
(select a.medicalcardno
from CJC_T2_00000000000000000 a
where a.tranname = 'XX解绑'
and a.retcode = '0000')
新库SQL改写部分:
SELECT '0' "KERNAL_INTER4"
FROM CHENPMB."CJC_T2_00000000000000000" "A",
CHENPMB."CJC_T2_00000000000000000" "C"
WHERE "C"."TRANNAME" = 'XX绑定'
AND "C"."RETCODE" = '0000'
AND "C"."TRANTIME" IS NOT NULL
AND LENGTH("C"."TRANTIME") = 14
AND "C"."TRANTIME" > '20200501000000'
AND "C"."MEDICALCARDNO" = "A"."MEDICALCARDNO"
AND "A"."TRANNAME" = 'XX解绑'
AND "A"."RETCODE" = '0000'
原库SQL改写部分:
(SELECT '0' "KERNAL_INTER4"
FROM CHENPMB."CJC_T2_00000000000000000" "C"
WHERE "C"."TRANNAME" = 'XX绑定'
AND "C"."RETCODE" = '0000'
AND "C"."TRANTIME" IS NOT NULL
AND LENGTH("C"."TRANTIME") = 14
AND "C"."TRANTIME" > '20200501000000'
AND NOT EXISTS
(SELECT 0
FROM CHENPMB."CJC_T2_00000000000000000" "A"
WHERE LNNVL("A"."MEDICALCARDNO" <> "C"."MEDICALCARDNO")
AND "A"."TRANNAME" = 'XX解绑'
AND "A"."RETCODE" = '0000'))
根据执行计划可知,
原库将not in改写成NOT EXISTS,表关联走的 HASH JOIN RIGHT ANTI NA,效率较高。
尝试将新库SQL中not in改写成NOT EXISTS后,效率有明显提升,不到5分钟可以执行完成。
SQL是否还有优化空间呢?
可以看看Oracle给出的建议sql tuning advisor:
SQL>
DECLARE
my_task_name VARCHAR2(30);
BEGIN
my_task_name := DBMS_SQLTUNE.CREATE_TUNING_TASK(sql_id => 'gzjy5nh7g0sfs',
scope => 'COMPREHENSIVE',
time_limit => 3600,
task_name => 'cjc_sql_tuing_task',
description => 'SQL TUNE TEST');
DBMS_SQLTUNE.EXECUTE_TUNING_TASK(task_name => 'cjc_sql_tuing_task');
END;
/
SELECT status FROM DBA_ADVISOR_TASKS WHERE task_name = 'cjc_sql_tuing_task';
SET LONG 10000
SET LONGCHUNKSIZE 1000
SET LINESIZE 300
SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK('cjc_sql_tuing_task') FROM DUAL;
建议如下:
1- SQL Profile Finding (see explain plans section below)
--------------------------------------------------------
为此语句找到了性能更好的执行计划 2。选择以下 SQL 概要文件之一进行实施。
Recommendation (estimated benefit: 95.14%)
------------------------------------------
- 考虑接受推荐的 SQL 概要文件。
execute dbms_sqltune.accept_sql_profile(task_name =>
'cjc_sql_tuing_task', task_owner => 'CHENPMB', replace => TRUE);
Recommendation (estimated benefit: 99.95%)
------------------------------------------
- 考虑接受建议的 SQL 概要文件, 以便对此语句使用并行执行。
execute dbms_sqltune.accept_sql_profile(task_name =>
'cjc_sql_tuing_task', task_owner => 'CHENPMB', replace => TRUE,
profile_type => DBMS_SQLTUNE.PX_PROFILE);
与 DOP 128 并行执行此查询会使 SQL 概要文件计划上的响应时间缩短 99.13%。但是, 启用并行执行时要付出一些代价。它将增加语句的资源消耗
(预计为 11.07%), 这会导致系统吞吐量降低。此外, 由于在非常短的持续时间内消耗了这些资源, 因此如果没有足够可用的硬件容量,
并发语句的响应时间将受到负面影响。
2- Index Finding (see explain plans section below)
--------------------------------------------------
通过创建一个或多个索引可以改进此语句的执行计划。
Recommendation (estimated benefit: 99.7%)
-----------------------------------------
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000001 on
CHENPMB.MB_PAY_LOG("MPL_TRANCODE","MPL_STT",LENGTH("MPL_TIME"),"MPL_TIME");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000002 on
CHENPMB.MB_FINANCE_TRANS_INCREMENT("MFT_TRAN_STATUS",LENGTH("MFT_TRAN_TIME"
),"MFT_TRAN_CODE","MFT_TRAN_TIME");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHEN.IDX$$_04000003 on CHEN.CJC_T1_00000000000000000("MPB_BUSI
NESSTYPE",LENGTH("MPB_TRANTIME"),"MPB_TRANTIME","CJC_T3_000000");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000004 on
CHENPMB.MB_LOANBYFINANCE_RECORD("MLR_TRANCODE","MLR_LOANTYPEFLAG","MLR_STAT
US",LENGTH("MLR_TRANTIME"),"MLR_TRANTIME");
- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。
create index CHENPMB.IDX$$_04000006 on
CHENPMB.MB_TRANFLOW_INCREMENT("TRF_BSNCODE","TRF_STT",LENGTH("TRF_SUBTIME")
,"TRF_SUBTIME","TRF_FLOWNO");
Rationale
---------
创建推荐的索引可以显著地改进此语句的执行计划。但是, 使用典型的 SQL 工作量运行 "访问指导"
可能比单个语句更可取。通过这种方法可以获得全面的索引建议案, 包括计算索引维护的开销和附加的空间消耗。
最终建议:
1 将SQL中not in部分改写成not exists。
2 创建组合索引。
优化效果:
SQL由原来5小时查不出结果,优化到12秒完成查询。
###2021-03-31 22:00 chenjuchao###
欢迎关注我的公众号《IT小Chen》