Oracle merge into 基于大数据量的优化方向

最新推荐文章于 2024-10-18 15:38:05 发布

大树的困惑

最新推荐文章于 2024-10-18 15:38:05 发布

阅读量5.8k

点赞数 7

分类专栏： Oracle 文章标签： oracle sql 数据库

本文链接：https://blog.youkuaiyun.com/weixin_44445168/article/details/115112429

版权

Oracle 专栏收录该内容

7 篇文章

订阅专栏

本文探讨了Oracle数据库中针对大数据量更新操作的优化方法，主要聚焦于MergeInto语句的使用。通过存储优化，如利用临时表减少内存压力；减少对比数据量，例如根据业务需求筛选条件；以及调整执行顺序，将InsertInto操作独立以提高性能。这些策略旨在提升MergeInto的执行效率和系统资源利用率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Oracle merge into 基于大数据量的优化方向

针对Oracle数据库的更新操作我们会使用

update tableName set col1=col2 where condition...

这样子的事务提交效率比较低下，所以大部分情况下需要批量更新的话我们会使用merge into 进行替换

merge into target_tableName t
using (
	select id , col1, col2 ... from source_tableName
) a
on (t.id= a.id)
when matched then 
update set
t.col1=a.col1 ,
t.col2=a.col2 ,
...
when not matched then 
 insert
        (t.col1,
         t.col2,
         t.col3,
         ...
         )
      values
        (a.col1,
         a.col2,
         a.col3,
         ...);

常规的merge into 的应用如上，对于日常的数据量是基本符合要求的

这里要讨论面对大数据量时的优化方向

1.存储优化

merge into 的执行过程会load 两部分数据，一部分是需要更新的目标表，还有用于比对数据的using中的临时表，而using 中载入的表数据集需要等到merge into 执行完之后才会释放，而如果你using中包含了多张表的关联逻辑

如：

merge into target_TableName
using(
	select a.col1 from tablea a 
    	left join tableb b on a.id=b.id
   		left join tablec c on a.id=c.id
)....

如上逻辑，会导致using过程需要加载tablea,tableb,tablec 三张表的数据到一块临时表空间中，这样会导致内存的压力上升（针对大数据量下，小数据量可以忽略）

针对这种情况可以使用oracle提供的临时表进行优化

如：

drop table TMP_table_a;
    create global temporary table TMP_table_a (
    	col1 varchar2(64),
        col2 varchar2(64),
        col3 varchar2(64),
        col4 varchar2(64),
        col5 varchar2(64)
)
on commit delete rows;
--此临时表的功效是在事务进行提交之后，会删除掉表中所有的行

借助临时表的功能，可以提前将关联逻辑后的结果插入到临时表，merge into 之取临时表就可以了

insert into TMP_table_a values(col1 , col2 ...)
select col1,col2 .. from tablea a left join tableb b on a.id=b.id
....conditions

--在进行merge into
merge into target_TableName t
using(
	select col1... from TMP_table_a ..
) a
on (a.col1=t.col1)
when matched then 
update set 
t.col1 = a.col1 
when not matched then 
 insert
        (t.col1,
         t.col2,
         t.col3,
         ...
         )
      values
        (a.col1,
         a.col2,
         a.col3,
         ...);

2.减少对比数据量

减少对比数据量对于业务需求的理解能力要求相对高点，需要找到能够过过滤的条件，

比如对于每天增量的表就可以通过控制日期

（结合上面的存储优化，可以选择在插入临时表的时候就进行优化）

insert into TMP_table_a values(col1 , col2 ...)
select col1,col2 .. from tablea a left join tableb b on a.id=b.id
....where lastupdatedate>=end_date and conditions..
--lastupdatedate>=end_date

3.优化执行顺序

正常的使用merge into 的话，通常会使用

when matched then ...
when not matched then ...
--用于区分采取insert into 操作还是 update 操作

这样做有个比较大的弊端，merge into 它是通过对比数据来选择操作的，会从目标表中每一条数据和你的逻辑结果集（using 中的结果）中的数据进行比对，如果你的结果集过大，那么就会在

on (…)

这块数据比对逻辑这里耗费比较多的时间，而update set 和 insert into 是分开的，也就是说它会比对两次数据，所以在实际中，为了提高性能支持，会将insert into 从merge into 中抽离出来，merge into 只做更新操作

--merge into 只做更新
merge into target_tableName 
using(select col1... from TMP_table_a ..
) a
on (a.col1=t.col1)
when matched then 
update set 
t.col1 = a.col1  ;

--后续插入操作
insert into target_tableName (col1..) 
select col1 from TMP_table_a 
where conditions ...