Dinky项目实战：使用MySQL CDC整库同步数据到Paimon

计蕴斯Lowell

于 2025-06-11 09:17:17 发布

阅读量329

点赞数 4

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00095/article/details/148578434

Dinky项目实战：使用MySQL CDC整库同步数据到Paimon

dinky Dinky is an out-of-the-box, one-stop, real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake. 项目地址: https://gitcode.com/gh_mirrors/di/dinky

概述

本文将详细介绍如何在Dinky项目中实现MySQL数据库到Paimon的整库同步。CDC(Change Data Capture)技术能够捕获数据库的变更事件，而Paimon作为流批一体的存储系统，非常适合作为数据湖的存储层。通过Dinky提供的CDCSOURCE功能，我们可以轻松实现这一数据同步过程。

环境准备

在开始之前，需要确保以下组件已正确配置：

依赖包准备：
- 将Paimon的Flink连接器jar包放置于Flink/lib和Dinky/extends目录下
- 将MySQL CDC连接器jar包放置于Flink/lib和Dinky/extends目录下
- 如果使用Application/Per-Job提交模式，还需确保这些jar包已上传至HDFS
注意事项：
- 如果在Flink和Dinky启动后才添加这些jar包，需要重启服务
- 或者使用Dinky提供的ADD CUSTOMJAR功能动态加载jar包

实现方案

Dinky提供了两种方式实现MySQL到Paimon的整库同步：

方案一：使用SQL Catalog方式

EXECUTE CDCSOURCE demo WITH (
  'connector' = 'mysql-cdc',
  'hostname' = '127.0.0.1',
  'port' = '3306',
  'username' = 'root',
  'password' = '123456',
  'checkpoint' = '10000',
  'scan.startup.mode' = 'initial',
  'parallelism' = '1',
  'table-name' = 'test\..*',
  'sink.connector' = 'sql-catalog',
  'sink.catalog.name' = 'fts',
  'sink.catalog.type' = 'table-store',
  'sink.catalog.warehouse'='file:/tmp/table_store'
);

参数说明：

connector: 指定使用mysql-cdc连接器
hostname/port: MySQL服务器地址和端口
username/password: 数据库认证信息
checkpoint: 检查点间隔(毫秒)
scan.startup.mode: 初始同步模式(initial表示全量+增量)
table-name: 使用正则表达式匹配需要同步的表
sink.connector: 指定使用sql-catalog
sink.catalog.*: 配置Paimon catalog相关信息

方案二：使用Paimon自动建表方式

EXECUTE CDCSOURCE dinky_paimon_test
WITH
  (
    'connector' = 'mysql-cdc',
    'hostname' = '',
    'port' = '',
    'username' = '',
    'password' = '',
    'checkpoint' = '10000',
    'parallelism' = '1',
    'scan.startup.mode' = 'initial',
    'database-name' = 'dinky',
    'sink.connector' = 'paimon',
    'sink.path' = 'hdfs:/tmp/paimon/#{schemaName}.db/#{tableName}',
    'sink.auto-create' = 'true',
  );

方案特点：