基于flinkcdc和superset的实时大屏实践2_flink cdc 可视化-优快云博客

本文介绍了如何在EMR环境下配置SSL加密的Kafka Connect服务，部署Debezium MySQL Connector，将MySQL数据实时同步到Kafka，然后通过Flink SQL处理并最终导入到Superset进行实时大屏展示。详细步骤包括启动Kafka Connect、配置Debezium MySQL Source Connector以及Flink SQL操作的全过程。

准备环境：

1）EMR-Flink-Cluster3.36.1（HDFS2.8.5 YARN2.8.5 Flink1.12-vvr-3.0.2）

2）Rds-Mysql 5.7.26

3）EMR-Kafka-Cluster4.9.0（Kafka_2.12-2.4.1-1.0.0 Zookeeper3.6.2）

4）Debezium-Mysql-Connector 1.2.0

5）EMR-Hadoop-Cluster4.9.0（SuperSet0.36.0）

方案理由及解决问题：

1. Flinkcdc与debezium方案对比：

前者支持：mysql5.7及以上，pgsql9.6及以上
debezium支持：mysql5.5及以上、pgsql、mongodb、oracle、sql server等多种数据源，而下游flink仅需要使用kafka一种中间数据源即可

2. 主要解决问题：

1）对配置了SSL的EMR-Kafka集群手动添加分布式kafka-connect服务

2）多种数据源（此文仅验证了mysql）->debezium->kafka->flinksql->mysql-superset实时大屏方案实践

3）简单描述了kafka-connector-mysql-source的启动流程

注：
flinkcdc官网：About Flink CDC — Flink CDC 2.0.0 documentation

方案架构：

在EMR-Kafka集群中启动kafka-connect服务：

1. 使用分布式配置部署kafka-connect，选用配置文件：

/var/lib/ecm-agent/cache/ecm/service/KAFKA/2.12-2.4.1.1.1/package/templates/connect-distributed.properties

2. 更改connect-distributed.properties配置：

##

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

##

# This file contains some of the configurations for the Kafka Connect distributed worker. This file is intended

# to be used with the examples, and some settings may differ from those used in a production system, especially

# the `bootstrap.servers` and those specifying replication factors.

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.

#bootstrap.servers={ {bootstrap_servers}}

bootstrap.servers=emr-header-1.cluster-231710:9092,emr-worker-1.cluster-231710:9092,emr-worker-2.cluster-231710:9092

# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs

group.id=connect-cluster-stage

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will

# need to configure these based on the format they want their data in when loaded from or stored into Kafka

key.converter=org.apache.kafka.connect.json.JsonConverter

value.converter=org.apache.kafka.connect.json.JsonConverter

# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply

# it to

key.converter.schemas.enable=true

value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.

# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create

# the topic before starting Kafka Connect if a specific topic configuration is needed.

# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.

# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able

# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.

offset.storage.topic=connect-offsets-stage

offset.storage.replication.factor=2

offset.storage.partitions=5

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,

# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create

# the topic before starting Kafka Connect if a specific topic configuration is needed.

# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.

# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to b