flinkCDC同步doris

概述

flinkCDC充分发挥了Debezium提供的能力

支持连接

flinkCDC与flink版本对照

特点:

  1. 支持读取数据库快照,即使发生故障也可只读取一次binlog。
  2. DataStream API:用户可以在一个作业中使用多个数据库和表上的更改,而无需部署Debezium和Kafka。
  3. Table/SQL API:用户可以使用SQL DDL创建CDC源来监视单个表上的更改

一、安装doris

安装doris

二、安装Flink

选择以下两种安装方式的一种
Flink Standalone-HA高可用集群模式
Dlink

三、Flink-doris

1、下载

下载 - Apache Doris


从阿里maven仓库中下载jar
仓库服务
搜索flink-doris

四、flinkCDC

1、下载

Central Repository: com/ververica/flink-sql-connector-mysql-cdc


将两个jar包放入172、166、168三台服务器的flink/lib下

重启flink集群
bash /opt/software/flink/flink-1.17.0/bin/stop-cluster.sh
bash /opt/software/flink/flink-1.17.0/bin/start-cluster.sh

五、数据

1、Mysql数据准备

CREATE DATABASE emp_1;
 USE emp_1;
CREATE TABLE employees_1 (
  emp_no      INT             NOT NULL,
  birth_date  DATE            NOT NULL,
  first_name  VARCHAR(14)     NOT NULL,
  last_name   VARCHAR(16)     NOT NULL,
  gender      ENUM ('M','F')  NOT NULL,    
  hire_date   DATE            NOT NULL,
  PRIMARY KEY (emp_no)
);

INSERT INTO `employees_1` VALUES (10001,'1953-09-02','Georgi','Facello','M','1986-06-26'),
(10002,'1964-06-02','Bezalel','Simmel','F','1985-11-21'),
(10003,'1959-12-03','Parto','Bamford','M','1986-08-28'),
(10004,'1954-05-01','Chirstian','Koblick','M','1986-12-01'),
(10005,'1955-01-21','Kyoichi','Maliniak','M','1989-09-12'),
(10006,'1953-04-20','Anneke','Preusig','F','1989-06-02'),
(10007,'1957-05-23','Tzvetan','Zielinski','F','1989-02-10'),
(10008,'1958-02-19','Saniya','Kalloufi','M','1994-09-15'),
(10009,'1952-04-19','Sumant','Peac','F','1985-02-18'),
(10010,'1963-06-01','Duangkaew','Piveteau','F','1989-08-24'),
(10011,'1953-11-07','Mary','Sluis','F','1990-01-22'),
(10012,'1960-10-04','Patricio','Bridgland','M','1992-12-18'),
(10013,'1963-06-07','Eberhardt','Terkki','M','1985-10-20'),
(10014,'1956-02-12','Berni','Genin','M','1987-03-11'),
(10015,'1959-08-19','Guoxiang','Nooteboom','M','1987-07-02'),
(10016,'1961-05-02','Kazuhito','Cappelletti','M','1995-01-27'),
(10017,'1958-07-06','Cristinel','Bouloucos','F','1993-08-03'),
(10018,'1954-06-19','Kazuhide','Peha','F','1987-04-03'),
(10019,'1953-01-23','Lillian','Haddadi','M','1999-04-30'),
(10020,'1952-12-24','Mayuko','Warwick','M','1991-01-26'),
(10021,'1960-02-20','Ramzi','Erde','M','1988-02-10'),
(10022,'1952-07-08','Shahaf','Famili','M','1995-08-22'),
(10023,'1953-09-29','Bojan','Montemayor','F','1989-12-17'),
(10024,'1958-09-05','Suzette','Pettey','F','1997-05-19'),
(10025,'1958-10-31','Prasadram','Heyers','M','1987-08-17'),
(10026,'1953-04-03','Yongqiao','Berztiss','M','1995-03-20'),
(10027,'1962-07-10','Divier','Reistad','F','1989-07-07'),
(10028,'1963-11-26','Domenick','Tempesti','M','1991-10-22'),
(10029,'1956-12-13','Otmar','Herbst','M','1985-11-20'),
(10030,'1958-07-14','Elvis','Demeyer','M','1994-02-17'),
(10031,'1959-01-27','Karsten','Joslin','M','1991-09-01'),
(10032,'1960-08-09','Jeong','Reistad','F','1990-06-20'),
(10033,'1956-11-14','Arif','Merlo','M','1987-03-18'),
(10034,'1962-12-29','Bader','Swan','M','1988-09-21'),
(10035,'1953-02-08','Alain','Chappelet','M','1988-09-05'),
(10036,'1959-08-10','Adamantios','Portugali','M','1992-01-03');

CREATE TABLE employees_2 (
  emp_no      INT             NOT NULL,
  birth_date  DATE            NOT NULL,
  first_name  VARCHAR(14)     NOT NULL,
  last_name   VARCHAR(16)     NOT NULL,
  gender      ENUM ('M','F')  NOT NULL,    
  hire_date   DATE            NOT NULL,
  PRIMARY KEY (emp_no)
);

INSERT INTO `employees_2` VALUES (10037,'1963-07-22','Pradeep','Makrucki','M','1990-12-05'),
(10038,'1960-07-20','Huan','Lortz','M','1989-09-20'),
(10039,'1959-10-01','Alejandro','Brender','M','1988-01-19'),
(10040,'1959-09-13','Weiyi','Meriste','F','1993-02-14'),
(10041,'1959-08-27','Uri','Lenart','F','1989-11-12'),
(10042,'1956-02-26','Magy','Stamatiou','F','1993-03-21'),
(10043,'1960-09-19','Yishay','Tzvieli','M','1990-10-20'),
(10044,'1961-09-21','Mingsen','Casley','F','1994-05-21'),
(10045,'1957-08-14','Moss','Shanbhogue','M','1989-09-02'),
(10046,'1960-07-23','Lucien','Rosenbaum','M','1992-06-20'),
(10047,'1952-06-29','Zvonko','Nyanchama','M','1989-03-31'),
(10048,'1963-07-11','Florian','Syrotiuk','M','1985-02-24'),
(10049,'1961-04-24','Basil','Tramer','F','1992-05-04'),
(10050,'1958-05-21','Yinghua','Dredge','M','1990-12-25'),
(10051,'1953-07-28','Hidefumi','Caine','M','1992-10-15'),
(10052,'1961-02-26','Heping','Nitsch','M','1988-05-21'),
(10053,'1954-09-13','Sanjiv','Zschoche','F','1986-02-04'),
(10054,'1957-04-04','Mayumi','Schueller','M','1995-03-13');

CREATE DATABASE emp_2;

USE emp_2;

CREATE TABLE employees_1 (
  emp_no      INT             NOT NULL,
  birth_date  DATE            NOT NULL,
  first_name  VARCHAR(14)     NOT NULL,
  last_name   VARCHAR(16)     NOT NULL,
  gender      ENUM ('M','F')  NOT NULL,    
  hire_date   DATE            NOT NULL,
  PRIMARY KEY (emp_no)
);

INSERT INTO `employees_1` VALUES  (10055,'1956-06-06','Georgy','Dredge','M','1992-04-27'),
(10056,'1961-09-01','Brendon','Bernini','F','1990-02-01'),
(10057,'1954-05-30','Ebbe','Callaway','F','1992-01-15'),
(10058,'1954-10-01','Berhard','McFarlin','M','1987-04-13'),
(10059,'1953-09-19','Alejandro','McAlpine','F','1991-06-26'),
(10060,'1961-10-15','Breannda','Billingsley','M','1987-11-02'),
(10061,'1962-10-19','Tse','Herber','M','1985-09-17'),
(10062,'1961-11-02','Anoosh','Peyn','M','1991-08-30'),
(10063,'1952-08-06','Gino','Leonhardt','F','1989-04-08'),
(10064,'1959-04-07','Udi','Jansch','M','1985-11-20'),
(10065,'1963-04-14','Satosi','Awdeh','M','1988-05-18'),
(10066,'1952-11-13','Kwee','Schusler','M','1986-02-26'),
(10067,'1953-01-07','Claudi','Stavenow','M','1987-03-04'),
(10068,'1962-11-26','Charlene','Brattka','M','1987-08-07'),
(10069,'1960-09-06','Margareta','Bierman','F','1989-11-05'),
(10070,'1955-08-20','Reuven','Garigliano','M','1985-10-14'),
(10071,'1958-01-21','Hisao','Lipner','M','1987-10-01'),
(10072,'1952-05-15','Hironoby','Sidou','F','1988-07-21'),
(10073,'1954-02-23','Shir','McClurg','M','1991-12-01'),
(10074,'1955-08-28','Mokhtar','Bernatsky','F','1990-08-13'),
(10075,'1960-03-09','Gao','Dolinsky','F','1987-03-19'),
(10076,'1952-06-13','Erez','Ritzmann','F','1985-07-09'),
(10077,'1964-04-18','Mona','Azuma','M','1990-03-02'),
(10078,'1959-12-25','Danel','Mondadori','F','1987-05-26'),
(10079,'1961-10-05','Kshitij','Gils','F','1986-03-27'),
(10080,'1957-12-03','Premal','Baek','M','1985-11-19'),
(10081,'1960-12-17','Zhongwei','Rosen','M','1986-10-30'),
(10082,'1963-09-09','Parviz','Lortz','M','1990-01-03'),
(10083,'1959-07-23','Vishv','Zockler','M','1987-03-31'),
(10084,'1960-05-25','Tuval','Kalloufi','M','1995-12-15');


CREATE TABLE employees_2(
  emp_no      INT             NOT NULL,
  birth_date  DATE            NOT NULL,
  first_name  VARCHAR(14)     NOT NULL,
  last_name   VARCHAR(16)     NOT NULL,
  gender      ENUM ('M','F')  NOT NULL,    
  hire_date   DATE            NOT NULL,
  PRIMARY KEY (emp_no)
);

INSERT INTO `employees_2` VALUES (10085,'1962-11-07','Kenroku','Malabarba','M','1994-04-09'),
(10086,'1962-11-19','Somnath','Foote','M','1990-02-16'),
(10087,'1959-07-23','Xinglin','Eugenio','F','1986-09-08'),
(10088,'1954-02-25','Jungsoon','Syrzycki','F','1988-09-02'),
(10089,'1963-03-21','Sudharsan','Flasterstein','F','1986-08-12'),
(10090,'1961-05-30','Kendra','Hofting','M','1986-03-14'),
(10091,'1955-10-04','Amabile','Gomatam','M','1992-11-18'),
(10092,'1964-10-18','Valdiodio','Niizuma','F','1989-09-22'),
(10093,'1964-06-11','Sailaja','Desikan','M','1996-11-05'),
(10094,'1957-05-25','Arumugam','Ossenbruggen','F','1987-04-18'),
(10095,'1965-01-03','Hilari','Morton','M','1986-07-15'),
(10096,'1954-09-16','Jayson','Mandell','M','1990-01-14'),
(10097,'1952-02-27','Remzi','Waschkowski','M','1990-09-15'),
(10098,'1961-09-23','Sreekrishna','Servieres','F','1985-05-13'),
(10099,'1956-05-25','Valter','Sullins','F','1988-10-18'),
(10100,'1953-04-21','Hironobu','Haraldson','F','1987-09-21'),
(10101,'1952-04-15','Perla','Heyers','F','1992-12-28'),
(10102,'1959-11-04','Paraskevi','Luby','F','1994-01-26'),
(10103,'1953-11-26','Akemi','Birch','M','1986-12-02'),
(10104,'1961-11-19','Xinyu','Warwick','M','1987-04-16'),
(10105,'1962-02-05','Hironoby','Piveteau','M','1999-03-23'),
(10106,'1952-08-29','Eben','Aingworth','M','1990-12-19'),
(10107,'1956-06-13','Dung','Baca','F','1994-03-22'),
(10108,'1952-04-07','Lunjin','Giveon','M','1986-10-02'),
(10109,'1958-11-25','Mariusz','Prampolini','F','1993-06-16'),
(10110,'1957-03-07','Xuejia','Ullian','F','1986-08-22'),
(10111,'1963-08-29','Hugo','Rosis','F','1988-06-19'),
(10112,'1963-08-13','Yuichiro','Swick','F','1985-10-08'),
(10113,'1963-11-13','Jaewon','Syrzycki','M','1989-12-24'),
(10114,'1957-02-16','Munir','Demeyer','F','1992-07-17'),
(10115,'1964-12-25','Chikara','Rissland','M','1986-01-23'),
(10116,'1955-08-26','Dayanand','Czap','F','1985-05-28');

2、创建Doris数据库及表

create database demo;
use demo;
CREATE TABLE all_employees_info (
    emp_no       int NOT NULL,
    birth_date   date,
    first_name   varchar(20),
    last_name    varchar(20),
    gender       char(2),
    hire_date    date,
    database_name varchar(50),
    table_name    varchar(200)
)
UNIQUE KEY(`emp_no`, `birth_date`)
DISTRIBUTED BY HASH(`birth_date`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

3、进入Flink SQL Client

/opt/software/flink/flink-1.17.0/bin/sql-client.sh embedded

开启 checkpoint,每隔10秒做一次 checkpoint
set'execution.checkpointing.interval'='10s';

4、创建MySQL CDC表

在Flink SQL Client 下执行下面的 SQL

CREATE TABLE employees_source (
    database_name STRING METADATA VIRTUAL,
    table_name STRING METADATA VIRTUAL,
    emp_no int NOT NULL,
    birth_date date,
    first_name STRING,
    last_name STRING,
    gender STRING,
    hire_date date,
    PRIMARY KEY (`emp_no`) NOT ENFORCED
  ) WITH (
    'connector' = 'mysql-cdc',
    'hostname' = '10.9.70.172',
    'port' = '3306',
    'username' = 'root',
    'password' = 'Stwc.2wsx',
    'database-name' = 'emp_[0-9]+',
    'table-name' = 'employees_[0-9]+'
  );

mysql CDC option
chunk-key.even-distribution.factor.lower-bound
chunk-key.even-distribution.factor.upper-bound
chunk-meta.group.size
connect.max-retries
connect.timeout
connection.pool.size
connector
database-name
heartbeat.interval
hostname
password
port
property-version
scan.incremental.snapshot.chunk.key-column
scan.incremental.snapshot.chunk.size
scan.incremental.snapshot.enabled
scan.newly-added-table.enabled
scan.snapshot.fetch.size
scan.startup.mode
scan.startup.specific-offset.file
scan.startup.specific-offset.gtid-set
scan.startup.specific-offset.pos
scan.startup.specific-offset.skip-events
scan.startup.specific-offset.skip-rows
scan.startup.timestamp-millis
server-id
server-time-zone
split-key.even-distribution.factor.lower-bound
split-key.even-distribution.factor.upper-bound
table-name
username

‘database-name’ = ‘emp_[0-9]+’: 这里是使用了正则表达式,同时连接多个库
‘table-name’ = ‘employees_[0-9]+’:这里是使用了正则表达式,同时连接多个表

查询CDC表,我们可以看到下面的数据,标识一切正常
select * from employees_source limit 10;

[ERROR] Could not execute SQL statement. Reason:
java.io.StreamCorruptedException: unexpected block data


类加载顺序问题,flink默认是child-first,在flink的flink-conf.yaml文件中添加classloader.resolve-order: parent-first 改成parent-first,重启集群即可。
vim /opt/software/flink/flink-1.17.0/conf/flink-conf.yaml
classloader.resolve-order: parent-first

166和168同样修改
重启
bash /opt/software/flink/flink-1.17.0/bin/stop-cluster.sh bash /opt/software/flink/flink-1.17.0/bin/start-cluster.sh
再次执行上述语句
成功查询出

5、创建 Doris Sink 表

CREATE TABLE cdc_doris_sink (
    emp_no       int ,
    birth_date   STRING,
    first_name   STRING,
    last_name    STRING,
    gender       STRING,
    hire_date    STRING,
    database_name STRING,
    table_name    STRING
) 
WITH (
  'connector' = 'doris',
  'fenodes' = '10.9.70.87:8030',
  'table.identifier' = 'demo.all_employees_info',
  'username' = 'root',
  'password' = 'Stwc.2wsx',
  'sink.properties.two_phase_commit'='true',
  'sink.label-prefix'='doris_demo_emp_001'
);

参数说明

  • connector : 指定连接器是doris
  • fenodes:doris FE节点IP地址及http port
  • table.identifier : Doris对应的数据库及表名
  • username:doris用户名
  • password:doris用户密码
  • sink.properties.two_phase_commit:指定使用两阶段提交,这样在stream load的时候,会在http header里加上 two_phase_commit:true ,不然会失败
  • sink.label-prefix : 这个是在两阶段提交的时候必须要加的一个参数,才能保证两端数据一致性,否则会失败,如果多次提交则不能重复
  • 其他参数参考官方文档

Flink Doris Connector - ApacheDoris

这个时候查询Doris sink表是没有数据的 select * from cdc_doris_sink;

6、将数据插入到Doris表里

insert into cdc_doris_sink (emp_no,birth_date,first_name,last_name,gender,hire_date,database_name,table_name) 
select emp_no,cast(birth_date as string) as birth_date ,first_name,last_name,gender,cast(hire_date as string) as hire_date ,database_name,table_name from employees_source;


然后我们可以看到Flink WEB UI上的任务运行信息

7、查询Doris 数据

select count(1) from all_employees_info ;

select * from all_employees_info limit 20;

8、测试删除

在mysql库emp_2中执行

delete from employees_2 where emp_no in (10085,10086,10087);


验证Doris数据删除

select count(1) from  all_employees_info;

9、FlinkSQL Client运行在yarn集群上

cd /opt/software/flink/flink-1.17.0/bin/
./sql-client.sh embedded -s yarn-session

通过sqlClient进入已启动的yarn集群

./sql-client.sh embedded -s f9cb5ba978ae4bc195c6e5dd2aa08a6c

f9cb5ba978ae4bc195c6e5dd2aa08a6c为Job ID

image.png
image.png

重启项目时,Caused by:
org.apache.flink.runtime.client.JobSubmissionException: Failed to
submit JobGraph.

删除ZK中对应jobName文件

/opt/module/zookeeper-3.8.1/zkData/

image.png
删除version-2文件夹

五、flink管理平台安装

Flink SQL管理平台flink-streaming-platform-web安装搭建

【参考】

flinkCDC的doc
Overview — CDC Connectors for Apache Flink® documentation
基于Flink CDC 和 Doris Connector 实现 MySQL分库分表数据数据实时入Doris
基于Flink CDC 和 Doris Connector 实现 MySQL分库分表数据数据实时入Doris
Centos设置时间同步
Centos设置时间同步 - 马永猛 - 博客园
centos7等各操作系统实现ntp时钟同步 - 码农教程
ansible
Ansible 批量部署平台_weixin_33737774的博客-优快云博客

  

<think>嗯,用户想了解如何通过Flink CDC将数据从Oracle同步Doris。首先,我需要回忆一下相关组件和技术。Flink CDC是用来捕获变更数据的,而Doris是一个分析型数据库,适合实时分析。那么整个流程应该是配置Flink作业,使用Flink CDC连接Oracle,处理数据变化,然后写入Doris。 首先,用户可能需要确认各个组件的版本兼容性,比如Flink、Flink CDC Connector和Doris Connector的版本是否匹配。接下来是具体步骤:配置Oracle的CDC,确保归档日志开启,可能需要修改数据库参数和用户权限。然后编写Flink作业,使用Flink SQL或者DataStream API。这里可能需要注意Oracle CDC的配置参数,比如scan.startup.mode,处理初始快照或者增量数据。 另外,Doris这边需要配置FE和BE,创建表,然后通过Flink的Doris Connector写入。可能需要处理数据类型映射的问题,比如Oracle的日期类型到Doris的转换。还有数据写入方式,是用Stream Load还是其他方式,如何保证数据一致性,比如开启两阶段提交。 用户可能遇到的问题包括连接失败,比如网络问题或者权限不足,需要检查防火墙和用户权限。还有数据延迟,可能需要调整Flink的并行度或者检查点间隔。数据重复或丢失的话,可能需要检查Exactly-Once的配置,或者Doris的导入事务支持。 需要提醒用户注意性能调优,比如内存配置,并行度设置,以及监控作业的运行状态。最后,建议用户参考官方文档,确保所有配置正确,并进行测试验证同步的正确性。</think>以下是通过Flink CDC将Oracle数据同步Doris的步骤说明: --- ### 一、核心组件 1. **Oracle**:数据源数据库 2. **Flink CDC**:捕获Oracle的变更数据(Change Data Capture) 3. **Apache Flink**:流处理引擎 4. **Doris**:目标分析型数据库 --- ### 二、实现步骤 #### 步骤1:环境准备 - 确认版本兼容性: $$ \text{Flink 1.13+} \quad + \quad \text{flink-connector-oracle-cdc 2.3+} \quad + \quad \text{Doris Connector 1.1+} $$ - 开启Oracle归档日志(需DBA权限): ```sql ALTER DATABASE ARCHIVELOG; ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; CREATE USER flink_user IDENTIFIED BY password; -- 创建专用用户 GRANT SELECT ON V_$LOG TO flink_user; -- 授权日志访问 ``` #### 步骤2:配置Flink CDC Source ```sql CREATE TABLE oracle_source ( id INT, name STRING, update_time TIMESTAMP ) WITH ( 'connector' = 'oracle-cdc', 'hostname' = 'oracle_host', 'port' = '1521', 'username' = 'flink_user', 'password' = 'password', 'database-name' = 'ORCL', 'schema-name' = 'scott', 'table-name' = 'employees', 'debezium.log.mining.strategy' = 'online_catalog' ); ``` #### 步骤3:配置Doris Sink ```sql CREATE TABLE doris_sink ( id INT, name STRING, update_time TIMESTAMP ) WITH ( 'connector' = 'doris', 'fenodes' = 'doris_fe:8030', 'table.identifier' = 'db.table', 'username' = 'root', 'password' = '', 'sink.properties.format' = 'json', 'sink.properties.strip_outer_array' = 'true' ); ``` #### 步骤4:启动同步作业 ```sql INSERT INTO doris_sink SELECT * FROM oracle_source; ``` --- ### 三、关键技术点 1. **数据一致性保证**: - 通过`EXACTLY_ONCE`语义配置 - Doris需开启两阶段提交: ```prolog enable_two_phase_commit = true ``` 2. **性能优化**: ```yaml # Flink Checkpoint配置 execution.checkpointing.interval: 10s execution.checkpointing.tolerable-failed-checkpoints: 3 ``` 3. **异常处理**: - 断点续传:通过`scan.startup.mode=timestamp`指定恢复位点 - 错误重试:配置`jdbc.connection.max-retries=5` --- ### 四、常见问题排查 1. **连接失败**: - 检查Oracle监听状态:`lsnrctl status` - 验证Doris FE/BE节点连通性 2. **数据延迟**: ```shell # 查看Flink背压 curl http://flink-jobmanager:8081/jobs/<jobid>/backpressure ``` 3. **字段类型映射**: | Oracle类型 | Doris类型 | |------------|-----------| | NUMBER | BIGINT | | DATE | DATETIME | | CLOB | STRING | --- ### 五、推荐部署模式 ```mermaid graph LR A[Oracle] -->|CDC| B(Flink Cluster) B -->|Stream Load| C[Doris FE] C --> D[Doris BE] D --> E((数据分片)) ``` 建议生产环境采用Flink on YARN/K8s部署,Doris集群建议3FE + 3BE起步配置。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值