标签
PostgreSQL , Greenplum , merge insert , insert on conflict , 合并插入 , 有则更新 , 无则插入
背景
PostgreSQL insert on conflict语法非常强大,支持合并写入(当违反某唯一约束时,冲突则更新,不冲突则写入),同时支持流式计算。
流计算例子链接:
《PostgreSQL 流式统计 - insert on conflict 实现 流式 UV(distinct), min, max, avg, sum, count ...》
《HTAP数据库 PostgreSQL 场景与性能测试之 22 - (OLTP) merge insert|upsert|insert on conflict|合并写入》
《PostgreSQL upsert功能(insert on conflict do)的用法》
《PostgreSQL 10.0 preview 功能增强 - 支持分区表ON CONFLICT .. DO NOTHING》
PostgreSQL insert on conflict语法如下:
Command: INSERT
Description: create new rows in a table
Syntax:
[ WITH [ RECURSIVE ] with_query [, ...] ]
INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ]
[ OVERRIDING { SYSTEM | USER} VALUE ]
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
[ ON CONFLICT [ conflict_target ] conflict_action ]
[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
where conflict_target can be one of:
( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]
ON CONSTRAINT constraint_name
and conflict_action is one of:
DO NOTHING
DO UPDATE SET { column_name = { expression | DEFAULT } |
( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
( column_name [, ...] ) = ( sub-SELECT )
} [, ...]
[ WHERE condition ]
Greenplum的版本较低,还不支持insert on conflict的语法。
如果需要在Greenplum中实现类似的功能该如何操作?
DEMO
ID为PK,以它为合并列,举例。
1、目标表,也就是需要合并写入的目标:
create table t( id int primary key, c1 int , c2 int, c3 int, c4 int, c5 int, crt_time timestamp);
2、中间表,也就是用户只管插入的表:
create table t_tmp(like t);
写入一些中间记录。
insert into t_tmp values(1,1,2,3,null,null,now());
insert into t_tmp values(1,1,2,4,null,null,now());
insert into t_tmp values(1,1,2,3,null,7,now());
insert into t_tmp values(1,1,null,3,5,6,now());
postgres=# select * from t_tmp;
id | c1 | c2 | c3 | c4 | c5 | crt_time
----+----+----+----+----+----+----------------------------
1 | 1 | 2 | 3 | | | 2017-12-13 17:03:16.28482
1 | 1 | 2 | 4 | | | 2017-12-13 17:03:16.286302
1 | 1 | 2 | 3 | | 7 | 2017-12-13 17:03:16.635121
1 | 1 | | 3 | 5 | 6 | 2017-12-13 17:03:25.434191
(4 rows)
3、窗口合并,按唯一值约束,仅提取一条(可能存在窗口内合并的需求,例如按时间取最新,比如以最后一条为准,又或者以有值,且最新的为准)。
我有几张阿里云幸运券分享给你,用券购买或者升级阿里云相应产品会有特惠惊喜哦!把想要买的产品的幸运券都领走吧!快下手,马上就要抢光了。
以有值切最新为准例子:
select distinct on (id)
id,
first_value(c1) over (partition by id order by (case when c1 is null then null else crt_time end) desc nulls last) as c1,
first_value(c2) over (partition by id order by (case when c2 is null then null else crt_time end) desc nulls last) as c2,
first_value(c3) over (partition by id order by (case when c3 is null then null else crt_time end) desc nulls last) as c3,
first_value(c4) over (partition by id order by (case when c4 is null then null else crt_time end) desc nulls last) as c4,
first_value(c5) over (partition by id order by (case when c5 is null then null else crt_time end) desc nulls last) as c5,
first_value(crt_time) over (partition by id order by crt_time desc) as crt_time
from t_tmp ;
id | c1 | c2 | c3 | c4 | c5 | crt_time
----+----+----+----+----+----+----------------------------
1 | 1 | 2 | 3 | 5 | 6 | 2017-12-13 17:03:25.434191
(1 row)
存储中间结果:
create table t_tmp1 (like t) ;
insert into t_tmp1
select distinct on (id)
id,
first_value(c1) over (partition by id order by (case when c1 is null then null else crt_time end) desc nulls last) as c1,
first_value(c2) over (partition by id order by (case when c2 is null then null else crt_time end) desc nulls last) as c2,
first_value(c3) over (partition by id order by (case when c3 is null then null else crt_time end) desc nulls last) as c3,
first_value(c4) over (partition by id order by (case when c4 is null then null else crt_time end) desc nulls last) as c4,
first_value(c5) over (partition by id order by (case when c5 is null then null else crt_time end) desc nulls last) as c5,
first_value(crt_time) over (partition by id order by crt_time desc) as crt_time
from t_tmp ;
4、合并写入:
将窗口提取的结果,合并写入目标表。
4.1、INNER JOIN,覆盖旧记录,同时补齐旧的字段(以NULL为判断条件。如果新的记录没有值,则取旧记录的值。)提取。
create table t_tmp2 (like t);
insert into t_tmp2
select
t_tmp.id,
coalesce(t_tmp.c1, t.c1),
coalesce(t_tmp.c2, t.c2),
coalesce(t_tmp.c3, t.c3),
coalesce(t_tmp.c4, t.c4),
coalesce(t_tmp.c5, t.c5),
coalesce(t_tmp.crt_time, t.crt_time)
from
t_tmp1 as t_tmp
inner join
t
using (id);
4.2、DELETE USING,删除全量表的符合条件的记录。
delete from t using t_tmp2 where t.id=t_tmp2.id;
4.3、INSERT
insert into t
select t_tmp1.* from t_tmp1 left join t_tmp2 using (id) where t_tmp2.* is null
union all
select * from t_tmp2;
原文链接