flinksql kafka写数据进hudi,数据写不进hudi

本文围绕FlinkSQL使用Kafka将数据写入Hudi展开。起初,按常规代码操作数据无法写入Hudi,但直接用insert into写入值可行。最终发现开启checkpoint后,数据能成功写入Hudi,并给出了相关代码示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

flinksql kafka写数据进hudi,数据写不进hudi。以下是测试的代码

CREATE TABLE IF NOT EXISTS dwd_dm_day_kafka (

`day_id` int,

`month_id` int,

`qn_id` int,

`year_id` int,

`phycial_delete` int

) WITH (

'connector' = 'kafka',

'topic' = 'dwd_dm_day_hudi_01',

'properties.bootstrap.servers' = 'master1.betacdh.com:9092,work1.betacdh.com:9092,work2.betacdh.com:9092',

'scan.startup.mode' = 'earliest-offset',

'format' = 'json'

);

CREATE TABLE dwd_dm_day_hudi(

`day_id` int,

`month_id` int,

`qn_id` int,

`year_id` int,

`phycial_delete` int,

PRIMARY KEY (`day_id`) NOT ENFORCED

) COMMENT '订单表'

WITH (

'connector' = 'hudi'

, 'path' = 'hdfs://work1.betacdh.com:8020/test_sdc/dwd_dm_day_hudi'

, 'table.type' = 'MERGE_ON_READ'

, 'write.option' = 'insert'

, 'hive_sync.enable' = 'true'

, 'hive_sync.mode' = 'hms'

, 'hive_sync.metastore.uris' = 'thrift://work1.betacdh.com:9083'

, 'hive_sync.table' = 'dwd_dm_day_hudi'

, 'hive_sync.db' = 'test_sdc'

,'write.tasks' = '1'

,'compaction.tasks' = '1'

,'compaction.async.enable' = 'true'

,'compaction.trigger.strategy' = 'num_commits'

,'compaction.delta_commits' = '1'

);

insert into dwd_dm_day_hudi select day_id,month_id,qn_id,year_id,phycial_delete from dwd_dm_day_kafka;

以上情况数据并不能写进hudi。

但是用insert into可以

CREATE TABLE dwd_dm_day_hudi(

`day_id` int,

`month_id` int,

`qn_id` int,

`year_id` int,

`phycial_delete` int,

PRIMARY KEY (`day_id`) NOT ENFORCED

) COMMENT '订单表'

WITH (

'connector' = 'hudi'

, 'path' = 'hdfs://work1.betacdh.com:8020/test_sdc/dwd_dm_day_hudi'

, 'table.type' = 'MERGE_ON_READ'

, 'write.option' = 'insert'

, 'hive_sync.enable' = 'true'

, 'hive_sync.mode' = 'hms'

, 'hive_sync.metastore.uris' = 'thrift://work1.betacdh.com:9083'

, 'hive_sync.table' = 'dwd_dm_day_hudi'

, 'hive_sync.db' = 'test_sdc'

,'write.tasks' = '1'

,'compaction.tasks' = '1'

,'compaction.async.enable' = 'true'

,'compaction.trigger.strategy' = 'num_commits'

,'compaction.delta_commits' = '1'

);

insert into dwd_dm_day_hudi values(3333,3333,3333,3333,3333);

直接将值insert into 是可以将数据写入hudi。

这个问题困扰了我一天了。今早无意中发现开启checkpoint后数据成功写入hudi。代码示例如下:

set execution.checkpointing.interval=20000;

-- 保存checkpoint文件的目录

set state.checkpoints.dir=hdfs:///flink/flink-checkpoints;

-- 任务取消后保留checkpoint,默认值NO_EXTERNALIZED_CHECKPOINTS,

-- 可选值NO_EXTERNALIZED_CHECKPOINTS、DELETE_ON_CANCELLATION、RETAIN_ON_CANCELLATION

set execution.checkpointing.externalized-checkpoint-retention=RETAIN_ON_CANCELLATION;

-- checkpoint模式,默认值EXACTLY_ONCE,可选值:EXACTLY_ONCE、AT_LEAST_ONCE

-- 要想支持EXACTLY_ONCE,需要sink端支持事务

set execution.checkpointing.mode=EXACTLY_ONCE;

-- checkpoint超时时间,默认10分钟

set execution.checkpointing.timeout=600000;

-- checkpoint文件保留数,默认1

set state.checkpoints.num-retained=3;

CREATE TABLE IF NOT EXISTS dwd_dm_day_kafka (

`day_id` int,

`month_id` int,

`qn_id` int,

`year_id` int,

`phycial_delete` int

) WITH (

'connector' = 'kafka',

'topic' = 'dwd_dm_day_hudi_01',

'properties.bootstrap.servers' = 'master1.betacdh.com:9092,work1.betacdh.com:9092,work2.betacdh.com:9092',

'scan.startup.mode' = 'earliest-offset',

'format' = 'json'

);

CREATE TABLE dwd_dm_day_hudi(

`day_id` int,

`month_id` int,

`qn_id` int,

`year_id` int,

`phycial_delete` int,

PRIMARY KEY (`day_id`) NOT ENFORCED

) COMMENT '订单表'

WITH (

'connector' = 'hudi'

, 'path' = 'hdfs://work1.betacdh.com:8020/test_sdc/dwd_dm_day_hudi'

, 'table.type' = 'MERGE_ON_READ'

, 'write.option' = 'insert'

, 'hive_sync.enable' = 'true'

, 'hive_sync.mode' = 'hms'

, 'hive_sync.metastore.uris' = 'thrift://work1.betacdh.com:9083'

, 'hive_sync.table' = 'dwd_dm_day_hudi'

, 'hive_sync.db' = 'test_sdc'

,'write.tasks' = '1'

,'compaction.tasks' = '1'

,'compaction.async.enable' = 'true'

,'compaction.trigger.strategy' = 'num_commits'

,'compaction.delta_commits' = '1'

);

insert into dwd_dm_day_hudi select day_id,month_id,qn_id,year_id,phycial_delete from dwd_dm_day_kafka;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值