基于flinkcdc和superset的实时大屏实践2

本文介绍了如何在EMR环境下配置SSL加密的Kafka Connect服务,部署Debezium MySQL Connector,将MySQL数据实时同步到Kafka,然后通过Flink SQL处理并最终导入到Superset进行实时大屏展示。详细步骤包括启动Kafka Connect、配置Debezium MySQL Source Connector以及Flink SQL操作的全过程。

准备环境:

1)EMR-Flink-Cluster3.36.1(HDFS2.8.5 YARN2.8.5 Flink1.12-vvr-3.0.2)

2)Rds-Mysql 5.7.26

3)EMR-Kafka-Cluster4.9.0(Kafka_2.12-2.4.1-1.0.0 Zookeeper3.6.2)

4)Debezium-Mysql-Connector 1.2.0

5)EMR-Hadoop-Cluster4.9.0(SuperSet0.36.0)

方案理由及解决问题:

1. Flinkcdc与debezium方案对比:

前者支持:mysql5.7及以上,pgsql9.6及以上
debezium支持:mysql5.5及以上、pgsql、mongodb、oracle、sql server等多种数据源,而下游flink仅需要使用kafka一种中间数据源即可

2. 主要解决问题:

1) 对配置了SSL的EMR-Kafka集群手动添加分布式kafka-connect服务

2) 多种数据源(此文仅验证了mysql)->debezium->kafka->flinksql->mysql-superset实时大屏方案实践

3) 简单描述了kafka-connector-mysql-source的启动流程

注:
​​​​​​​flinkcdc官网:About Flink CDC — Flink CDC 2.0.0 documentation

方案架构:

 

在EMR-Kafka集群中启动kafka-connect服务:

1. 使用分布式配置部署kafka-connect,选用配置文件:

/var/lib/ecm-agent/cache/ecm/service/KAFKA/2.12-2.4.1.1.1/package/templates/connect-distributed.properties

2. 更改connect-distributed.properties配置:

##

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#    http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

##

# This file contains some of the configurations for the Kafka Connect distributed worker. This file is intended

# to be used with the examples, and some settings may differ from those used in a production system, especially

# the `bootstrap.servers` and those specifying replication factors.

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.

#bootstrap.servers={ {bootstrap_servers}}

bootstrap.servers=emr-header-1.cluster-231710:9092,emr-worker-1.cluster-231710:9092,emr-worker-2.cluster-231710:9092

# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs

group.id=connect-cluster-stage

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will

# need to configure these based on the format they want their data in when loaded from or stored into Kafka

key.converter=org.apache.kafka.connect.json.JsonConverter

value.converter=org.apache.kafka.connect.json.JsonConverter

# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply

# it to

key.converter.schemas.enable=true

value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.

# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create

# the topic before starting Kafka Connect if a specific topic configuration is needed.

# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.

# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able

# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.

offset.storage.topic=connect-offsets-stage

offset.storage.replication.factor=2

offset.storage.partitions=5

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,

# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create

# the topic before starting Kafka Connect if a specific topic configuration is needed.

# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.

# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to b

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值