ClickHouse根据STATE表及DEL表实现数据的更新与删除

考虑到alter table 对clickhouse的性能和数据一致性有一定的影响,所有尽量使用创建新表来替换旧表以实现数据的变更于删除,下面我将使用状态表和删除表来实现表的更新与删除

1.创建测试表
CREATE TABLE uk_price_paid
(
    `sid` UUID,
    `price` UInt32,
    `date` Date,
    `addr1` String,
    `addr2` String,
    `state` String,
    `key` String
)
ENGINE = MergeTree
ORDER BY key

这里的sid 不作为主键,key为唯一主键,每个key做对于的删除与修改时 state发生变化 如 0 新增 1 修改 2 删除 sid 使用UUID随机生成

2.添加测试数据 (随机生成10条key)
INSERT INTO uk_price_paid
SETTINGS max_http_get_redirects = 10
WITH splitByChar(' ', postcode) AS p
SELECT
    generateUUIDv4() AS sid,
    toUInt32(price_string) AS price,
    now() - 1 AS date,
    addr1,
    addr2,
    '0' AS state,
    rand() AS key
FROM url('http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv', 'CSV', 'uuid_string String,\n    price_string String,\n    time String,\n    postcode String,\n    a String,\n    b String,\n    c String,\n    addr1 String,\n    addr2 String,\n    street String,\n    locality String,\n    town String,\n    district String,\n    county String,\n    d String,\n    e String')
LIMIT 10
SETTINGS max_http_get_redirects = 10

3.添加修改数据和删除
//修改以9结尾key的数据,更改时间为29号做测试
INSERT INTO uk_price_paid SELECT
    generateUUIDv4(),
    price,
    '20240729' AS date,
    addr1,
    addr2,
    '1' AS state,
    key
FROM uk_price_paid
WHERE substring(key, length(key), 1) = '9'

//删除以3结尾的key的数据,更改时间为30号做测试

INSERT INTO uk_price_paid SELECT
    generateUUIDv4(),
    price,
    '20240730' AS date,
    addr1,
    addr2,
    '2' AS state,
    key
FROM uk_price_paid
WHERE substring(key, length(key), 1) = '3'

//现在有3条9结尾的key需要修改,3结尾的key都需要删除

 4.创建STATE状态表
CREATE TABLE default.state_tb
(
    `sid` UUID,
    `date` Date,
    `state` String,
    `key` String
)
ENGINE = MergeTree
ORDER BY key
SETTINGS index_granularity = 8192
AS SELECT
    sid,
    date,
    state,
    key
FROM uk_price_paid

5.创建DEL删除表
CREATE TABLE del_sid
ENGINE = MergeTree
ORDER BY key
AS SELECT
    key,
    sid,
    state
FROM uk_price_paid AS T1
INNER JOIN
(
    SELECT
        key,
        argMax(state, date) AS state
    FROM state_tb
    GROUP BY key HAVING state = '2'
) AS T2 ON T1.key = T2.key

因为状态为2的key的相关数据都需要删除,所有这里有状态为0,以3结尾的key

6.创建数据更新表(这里我将不改变修改后的key的原始id,只修改内容,及对应的29号的date数据)
CREATE TABLE uk_price_paid_new
ENGINE = MergeTree
ORDER BY key
AS SELECT
    T1.*,
    T2.key AS key,
    T2.osid AS osid
FROM
(
    SELECT *
    FROM uk_price_paid 
    WHERE (key, sid) IN (
        SELECT
            key,
            argMax(sid, date)
        FROM state_tb
        GROUP BY key
    )
) AS T1
INNER JOIN
(
    SELECT
        key,
        argMin(sid, date) AS osid
    FROM state_tb
    GROUP BY key
) AS T2 ON T1.key = T2.key
7.现在这里包含了所有最新的数据,但是它包含了需要删除的数据,再次创建表,来实现id变更为对应的原始id及osid字段,插入对于的新表时,将要删除的数据排除在外
drop table if EXISTS uk_price_paid_tmp 
create table uk_price_paid_tmp as uk_price_paid
INSERT INTO uk_price_paid_tmp SELECT
    osid AS sid,
    price,
    date,
    addr1,
    addr2,
    state,
    key
FROM uk_price_paid_new
WHERE sid NOT IN (
    SELECT sid
    FROM del_sid
)

现有的将是更新后的表,如果有增量时间需要的时候,更新相关环节的STATE和DEL表即可 

验证一下,根据DEL表中应该有2条3结尾的key状态为0的需要删除,并且9结尾的key时间都应该是29号,如下图

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值