sql优化实操

以下是同事人员发过来的sql,让我帮他优化一下,clickhouse的语句,需要替换成在自己的数据库上执行,dth_diagnosis_all 这个表的数据量90217505 ,千万级的数据量,说是内存不足,sql执行不了。


select sum(case when re.diagnosis_code = 'E10' then 1 else 0 end) as E10Count,
   sum(case when re.diagnosis_code = 'E11' then 1 else 0 end) as E11Count,
   sum(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then 1 else 0 end) as otherCount
from (select t1.id_no as id_no, left(t1.diagnosis_code,3) as diagnosis_code
                          from (select t2.id_no, t2.org_code, t2.event_type_code, t2.org_treatment_no,
                                       max(t2.register_date) as register_date,t2.diagnosis_code
                                from dth.dth_diagnosis_all t2
                                where t2.register_date <= #{initYear}
                                      and left(t2.diagnosis_code,3) in('E10','E11','E12','E13','E14')
                group by t2.id_no,t2.org_code,t2.event_type_code,t2.org_treatment_no,t2.diagnosis_code)t3,
          dth.dth_diagnosis_all t1
          where t1.id_no = t3.id_no
                and t1.org_code = t3.org_code and t1.diagnosis_code = t3.diagnosis_code
                and t1.event_type_code = t3.event_type_code
                and t1.org_treatment_no = t3.org_treatment_no
                and t1.register_date = t3.register_date
                and t1.org_province = #{province}
                and t1.org_city = #{city}
            )re left join (select * from dth_ehr_pir_all
        where birth_date  <=  #{initYear}
        and register_province = #{province}
            and register_city = #{city}      
        and diabetes_flag != '0'
        and (death_date = ''
                        or death_date > #{initYear})) t
                        on t.id_no = re.id_no
where t.birth_date  <=  #{initYear}
and t.register_province = #{province}
    and t.register_city = #{city}
and t.diabetes_flag != '0'
and (t.death_date = '' or t.death_date > #{initYear})
参数:initYear(查询年份12月31,如2029-12-31), #{province},查询省份:(山东省), #{city}  :查询城市

两个子查询的数据量

 以下为优化后的sql,看这个sql子查询t1和t3,其实是同一张表,然后对同一张表做连接查询,对t3表做了where条件过滤(这个不用改),之后连接查询之后又对t1表进行了where条件过滤,如下:       

t1.org_province = #{province}
and t1.org_city = #{city}

为啥不提前提到t1子查询里面去呢,

再看一下join条件中

and t1.diagnosis_code = t3.diagnosis_code

and t1.register_date = t3.register_date,

说明t3和t1连接条件是一样的,又是同一张表,t3中也对这两个列做了where限制,那么子查询t1中也可以加入这两个条件,于是把一下四个条件提到t1的子查询中去,大大降低了临时表占用的内存空间,

t1.org_province = #{province}
and t1.org_city = #{city}

and register_date <= #{initYear}
and left(diagnosis_code,3) in('E10','E11','E12','E13','E14')

 (
        select *
        from dth.dth_diagnosis_all 
        where
            org_province = #{province} and org_city = #{city}
            and register_date <= #{initYear}
            and left(diagnosis_code,3) in('E10','E11','E12','E13','E14')
    ) t1

然后就是针对 t 表做优化,子查询中已经对 t 表做了where条件过滤,连接查询后面又对 t 表进行单独的过滤,这个完全没必要再加同样的条件了。

之后再看最外层的

   sum(case when re.diagnosis_code = 'E10' then 1 else 0 end) as E10Count,
   sum(case when re.diagnosis_code = 'E11' then 1 else 0 end) as E11Count,
   sum(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then 1 else 0 end) as otherCount

这个一看就是对条件进行统计个数,可以改成效率更高的count

改下如下:

    count(case when re.diagnosis_code = 'E10' then '1' else null end) as E10Count,
    count(case when re.diagnosis_code = 'E11' then '1' else null end) as E11Count,
    count(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then '1' else null end) as otherCount

至此改完之后的结果

select 
    count(case when re.diagnosis_code = 'E10' then '1' else null end) as E10Count,
    count(case when re.diagnosis_code = 'E11' then '1' else null end) as E11Count,
    count(case when re.diagnosis_code != 'E10' and re.diagnosis_code != 'E11' then '1' else null end) as otherCount
from 
(
    select 
        t1.id_no as id_no, 
        left(t1.diagnosis_code,3) as diagnosis_code
    from
    (
        select 
            t2.id_no, t2.org_code, t2.event_type_code, t2.org_treatment_no,
            max(t2.register_date) as register_date,
            t2.diagnosis_code
        from dth.dth_diagnosis_all t2
        where
            t2.register_date <= #{initYear}
            and left(t2.diagnosis_code,3) in('E10','E11','E12','E13','E14')
        group by 
            t2.id_no,t2.org_code,t2.event_type_code,t2.org_treatment_no,t2.diagnosis_code
    ) t3
    inner join
    (
        select *
        from dth.dth_diagnosis_all 
        where
            org_province = #{province} and org_city = #{city}
            and register_date <= #{initYear}
            and left(diagnosis_code,3) in('E10','E11','E12','E13','E14')
    ) t1
    on 
        t1.id_no = t3.id_no
        and t1.org_code = t3.org_code 
        and t1.diagnosis_code = t3.diagnosis_code
        and t1.event_type_code = t3.event_type_code
        and t1.org_treatment_no = t3.org_treatment_no
        and t1.register_date = t3.register_date
) re 
left join 
(
    select * 
    from dth_ehr_pir_all
    where 
        birth_date  <=  #{initYear}
        and register_province = #{province}
        and register_city = #{city}      
        and diabetes_flag != '0'
        and (death_date = '' or death_date > #{initYear})
) t
on t.id_no = re.id_no;

 

sql能跑出来了,不会报出内存不足了

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值