Doris通过Flink CDC接入MySQL实战

1. 创建MySQL库表,写入demo数据

  1. 登录测试MySQL
 mysql -u root -pnew_password
  1. 创建MySQL库表,写入demo数据
CREATE DATABASE emp_1;
 USE emp_1;
CREATE TABLE employees_1 (
    emp_no      INT             NOT NULL,
    birth_date  DATE            NOT NULL,
    first_name  VARCHAR(14)     NOT NULL,
    last_name   VARCHAR(16)     NOT NULL,
    gender      ENUM ('M','F')  NOT NULL,    
    hire_date   DATE            NOT NULL,
    PRIMARY KEY (emp_no)
);

INSERT INTO `employees_1` VALUES (10001,'1953-09-02','Georgi','Facello','M','1986-06-26'),
(10002,'1964-06-02','Bezalel','Simmel','F','1985-11-21'),
(10036,'1959-08-10','Adamantios','Portugali','M','1992-01-03');

注意:MySQL需要开通bin-log

  • log_bin=mysql_bin
  • binlog-format=Row
  • server-id=1

2. 创建Doris库表

  1. 创建Doris表
mysql -uroot -P9030 -h127.0.0.1
create database demo;
use demo;
CREATE TABLE all_employees_info (
    emp_no       int NOT NULL,
    birth_date   date,
    first_name   varchar(20),
    last_name    varchar(20),
    gender       char(2),
    hire_date    date
)
UNIQUE KEY(`emp_no`, `birth_date`)
DISTRIBUTED BY HASH(`birth_date`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

3. 启动Flink

  1. 启动flink
cd /mnt/apps/flink-1.15.3/ 
#启动flink,这里服务已经启动
bin/start-cluster.sh 
#进入SQL控制台
bin/sql-client.sh embedded
  1. 创建Flink 任务:
SET 'execution.checkpointing.interval' = '10s';

CREATE TABLE employees_source (
    database_name STRING METADATA VIRTUAL,
    table_name STRING METADATA VIRTUAL,
    emp_no int NOT NULL,
    birth_date date,
    first_name STRING,
    last_name STRING,
    gender STRING,
    hire_date date,
    PRIMARY KEY (`emp_no`) NOT ENFORCED
  ) WITH (
    'connector' = 'mysql-cdc',
    'hostname' = 'localhost',
    'port' = '3306',
    'username' = 'root',
    'password' = 'new_password',
    'database-name' = 'emp_1',
    'table-name' = 'employees_1'
  );

CREATE TABLE cdc_doris_sink (
    emp_no       int ,
    birth_date   STRING,
    first_name   STRING,
    last_name    STRING,
    gender       STRING,
    hire_date    STRING
) 
WITH (
  'connector' = 'doris',
  'fenodes' = '172.16.64.9:8030',
  'table.identifier' = 'demo.all_employees_info',
  'username' = 'root',
  'password' = '',
  'sink.properties.two_phase_commit'='true',
  'sink.label-prefix'='doris_demo_emp_002'
);

insert into cdc_doris_sink (emp_no,birth_date,first_name,last_name,gender,hire_date) 
select emp_no,cast(birth_date as string) as birth_date ,first_name,last_name,gender,cast(hire_date as string) as hire_date  from employees_source;
  1. 输入如下地址,查看flink任务
    http://localhost:8081/#/job/running

  2. 数据验证:启动后可以看到有数据实时进入Doris了

mysql -uroot -P9030 -h127.0.0.1
mysql> select * from all_employees_info;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date  |
+--------+------------+------------+-----------+--------+------------+
|  10001 | 1953-09-02 | Georgi     | Facello   | M      | 1986-06-26 |
|  10002 | 1964-06-02 | Bezalel    | Simmel    | F      | 1985-11-21 |
|  10036 | 1959-08-10 | Adamantios | Portugali | M      | 1992-01-03 |
|  20001 | 1953-09-02 | Georgi     | Facello   | M      | 1986-06-26 |
+--------+------------+------------+-----------+--------+------------+
4 rows in set (0.02 sec)
Link
  • https://zhuanlan.zhihu.com/p/532913664
  • https://www.runoob.com/mysql/mysql-install.html
  • https://repo.maven.apache.org/maven2/org/apache/doris/flink-doris-connector-1.15/1.2.1/
  • https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.2.1/
Jar包地址:

flink 环境:1.15.3

  • https://dlcdn.apache.org/flink/flink-1.15.3/flink-1.15.3-bin-scala_2.12.tgz
    解压并将jar包防止在Flink 的lib下
    flink-doris-connector:1.15
  • https://repo.maven.apache.org/maven2/org/apache/doris/flink-doris-connector-1.15/1.2.1/flink-doris-connector-1.15-1.2.1.jar
    cdc mysql:flink-sql-connector-mysql-cdc-2.2.1.jar
  • https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.2.1/flink-sql-connector-mysql-cdc-2.2.1.jar
### 如何使用Flink CDCMySQL同步数据到Doris #### 配置环境准备 为了成功实施Flink CDCMySQLDoris的数据同步,需先准备好必要的软件包和依赖项。这包括安装并配置好Apache Flink集群以及确保已部署了支持CDC功能的JDBC驱动程序版本[^1]。 #### 创建Flink作业 创建一个新的Flink流处理应用程序,在此应用中定义源表(Source Table),该表指向要监控变更记录的MySQL数据库中的特定表格。利用`Debezium`作为捕获工具,它能够识别并解析二进制日志(binlog)事件从而获取增量更新信息[^2]。 ```sql CREATE TABLE mysql_source ( id BIGINT, name STRING, description STRING, PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'mysql-cdc', 'hostname' = 'localhost', 'port' = '3306', 'username' = 'root', 'password' = 'your_password', 'database-name' = 'test_db', 'table-name' = 'source_table' ); ``` 上述SQL语句展示了如何设置一个基于MySQL CDC连接器的源表,其中指定了目标主机名、端口、用户名和其他必要参数来建立与MySQL服务器之间的通信链接。 #### 定义Sink操作符 接着定义sink操作符用于指定目的位置——即存储最终结果的地方。这里的目标是将来自上游流程的数据写入到Doris数仓内相应的接收者(Receiver)表里去: ```sql CREATE TABLE doris_sink ( id BIGINT, name STRING, description STRING ) WITH ( 'connector' = 'doris', 'fenodes' = 'doris_fe_host:doris_fe_http_port', -- Doris FE节点地址 'table.identifier' = 'db_name.table_name', -- 目标表全限定名称 'username' = 'root', -- 密码(如果适用) ); ``` 这段代码片段说明了怎样构建通往Doris的目的地表结构,并设置了有关于Doris前端(FE)节点的信息以及其他认证细节以便顺利执行插入动作。 #### 启动数据流转过程 最后一步就是启动整个ETL管道,让其自动运行起来完成从源头读取变化后的行级数据并通过网络传输给目的地的过程。可以通过编写简单的查询逻辑把两个表关联起来形成完整的流水线: ```sql INSERT INTO doris_sink SELECT * FROM mysql_source; ``` 这条命令指示Flink引擎持续不断地抓取消息队列里的最新变动并将它们转发出去直至到达预设好的终点站[Doris][^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值