准备环境:
1)EMR-Flink-Cluster3.36.1(HDFS2.8.5 YARN2.8.5 Flink1.12-vvr-3.0.2)
2)Rds-Mysql 5.7.26
3)EMR-Kafka-Cluster4.9.0(Kafka_2.12-2.4.1-1.0.0 Zookeeper3.6.2)
4)Debezium-Mysql-Connector 1.2.0
5)EMR-Hadoop-Cluster4.9.0(SuperSet0.36.0)
方案理由及解决问题:
1. Flinkcdc与debezium方案对比:
前者支持:mysql5.7及以上,pgsql9.6及以上
debezium支持:mysql5.5及以上、pgsql、mongodb、oracle、sql server等多种数据源,而下游flink仅需要使用kafka一种中间数据源即可
2. 主要解决问题:
1) 对配置了SSL的EMR-Kafka集群手动添加分布式kafka-connect服务
2) 多种数据源(此文仅验证了mysql)->debezium->kafka->flinksql->mysql-superset实时大屏方案实践
3) 简单描述了kafka-connector-mysql-source的启动流程
注:
flinkcdc官网:About Flink CDC — Flink CDC 2.0.0 documentation
方案架构:
在EMR-Kafka集群中启动kafka-connect服务:
1. 使用分布式配置部署kafka-connect,选用配置文件:
/var/lib/ecm-agent/cache/ecm/service/KAFKA/2.12-2.4.1.1.1/package/templates/connect-distributed.properties
2. 更改connect-distributed.properties配置:
##
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##
# This file contains some of the configurations for the Kafka Connect distributed worker. This file is intended
# to be used with the examples, and some settings may differ from those used in a production system, especially
# the `bootstrap.servers` and those specifying replication factors.
# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
#bootstrap.servers={ {bootstrap_servers}}
bootstrap.servers=emr-header-1.cluster-231710:9092,emr-worker-1.cluster-231710:9092,emr-worker-2.cluster-231710:9092
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster-stage
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=connect-offsets-stage
offset.storage.replication.factor=2
offset.storage.partitions=5
# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to b

本文介绍了如何在EMR环境下配置SSL加密的Kafka Connect服务,部署Debezium MySQL Connector,将MySQL数据实时同步到Kafka,然后通过Flink SQL处理并最终导入到Superset进行实时大屏展示。详细步骤包括启动Kafka Connect、配置Debezium MySQL Source Connector以及Flink SQL操作的全过程。
最低0.47元/天 解锁文章

被折叠的 条评论
为什么被折叠?



