大纲
项目需求
把AWS上的表实时同步到GCP的BigQuery里,要求同步时间延迟小于1s,同时实现高可用。
架构

设计思路
1. AWS设计读写分离,Debezium连接从库,实现不影响主库性能的作用。
2. Debezium部署在K8S上,单节点部署,由K8S监测Debezium的docker运行状态
3. Debezium的offset和schema history 都存在Redis的Cluster里实现高可用,Redis保证Debezium
一旦出现故障也不会丢失数据同步信息。
4.利用GCP可以直接把Pub/Sub的message写入BigQuery的功能实现实时数据同步
5. 使用Debezium的增量快照功能最大利用节点资源
AWS配置
从库parameter_group配置
主库和从库需要进行一定配置来保证从库的binlog可以被debezium获取
Ref : Debezium Official Doc
从库配置:
gtid_mode=ON
enforce_gtid_consistency=ON
interactive_timeout=800000
wait_timeout=800000
binlog_rows_query_log_events=ON
binlog_annotate_row_events=ON
binlog_row_value_options="";
log_bin= mysql-bin
binlog_format= ROW
binlog_row_image=FULL
binlog_expire_logs_seconds=864000
注意主库和从库的有关binlog format的兼容性问题
https://debezium.io/documentation/reference/stable/connectors/mysql.html#enable-mysql-binlog
从库的binlog_format=ROW (必须)
主库的binlog_format=MIXED 或者ROW
从库User设置
CREATE USER 'YOUR_NAME'@'localhost' IDENTIFIED BY 'XXXXX';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'YOUR_NAME' IDENTIFIED BY 'XXXXX';
FLUSH PRIVILEGES;
主库创建Debezium signal表(增量快照准备工作)
CREATE TABLE your_database.debezium_signal (
id VARCHAR(42) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data VARCHAR(2048)
);
GCP配置
BigQuery表配置
这里创建的是Pub/Sub直接写入的表
CREATE TABLE IF NOT EXISTS
`bq_database.your_table_name` (
primary_key STRING,
....
__op STRING,
__table STRING,
__source_ts_ms NUMERIC,
__deleted STRING );
DELETE FROM `bq_database.your_table_name` WHERE 1=1;
CREATE OR REPLACE VIEW
`bq_view_database.your_view_name` AS (
SELECT
primary_key,
....
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY primary_key ORDER BY __source_ts_ms DESC) AS row_num
FROM
`bq_database.your_table_name`)
WHERE
row_num = 1
AND __op <> 'd');
这里创建的view会保证和AWS的表数据完全一致
PubSub 配置
1. Schema Topic (只需要一个topic就能catch全部table的schema信息)
Topic ID : project_id
(GCP的project id)
这个可以按需加订阅
2. Table Topic (一个table一个主题)
project_id.mysql_database_name.mysql_table_name
准备service_account.json
IAM->Service Account->Admin->Key->Add Keys
下载json文件
Debezium配置
准备conf文件
apiVersion: v1
kind: ConfigMap
metadata:
name: debezium-mysql-wheel-fdap-stg-config
namespace: fpda-debezium-container
data:
application.properties: |
debezium.sink.type=pubsub
debezium.sink.pubsub.project.id=your_project_id
debezium.source.connector.class=io.debezium.connector.mysql.MySqlConnector
debezium.source.offset.flush.interval.ms=0
debezium.source.offset.flush.timeout.ms=180000
debezium.source.offset.storage=io.debezium.storage.redis.offset.RedisOffsetBackingStore
debezium.source.offset.storage.redis.address=your_redis_ip:your_redis_port
debezium.source.offset.storage.redis.password=your_redis_password
debezium.source.offset.storage.redis.key=metadata:debezium:offsets_mysql_wheel_stg
debezium.source.schema.history.internal=io.debezium.storage.redis.history.RedisSchemaHistory
debezium.source.schema.history.internal.redis.address=your_redis_ip:your_redis_port
debezium.source.schema.history.internal.redis.password=your_redis_password
debezium.source.schema.history.internal.redis.key=metadata:debezium:schema_history_mysql_wheel_stg
debezium.source.decimal.handling.mode=double
debezium.source.database.history=io.debezium.relational.history.FileDatabaseHistory
debezium.source.database.history.file.filename=tmp/history.dat
debezium.source.database.hostname=your_mysql_ip
debezium.source.database.port=3306
debezium.source.database.user=your_mysql_user
debezium.source.database.password=your_mysql_password
debezium.source.database.include.list=your_database_name
debezium.source.database.server.id= your_server_id
debezium.source.topic.prefix=your_gcp_project_id
debezium.source.snapshot.include.collection.list=your_mysql_database.debezium_signal
debezium.source.signal.data.collection=your_mysql_database.debezium_signal
debezium.source.read.only=true
debezium.source.snapshot.mode=initial
debezium.source.snapshot.locking.mode=none
debezium.source.transforms=unwrap
debezium.source.transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
debezium.source.transforms.unwrap.add.fields=op,table,source.ts_ms
debezium.source.transforms.unwrap.delete.handling.mode=rewrite
debezium.source.key.converter.schemas.enable=false
debezium.source.value.converter.schemas.enable=false
debezium.incremental.snapshot.chunk.size=409600
注意:
1.Redis至少有两个master节点,一个写入schema信息,一个写入offset信息。需要判断key对应哪个节点然后hardcode写入ip地址
2. debezium.source.snapshot.include.collection=your_mysql_database.debezium_signal
保证在debezium初始化的时候只会同步debezium_signal这一张表。这样保证我们将来想要哪张表的加入实时同步传输的时候发送signal就可以了
INSERT INTO your_mysql_database.debezium_signal(id, type, data) VALUES ('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["your_mysql_database.order_status"], "type":"incremental"}');
3. 其他的配置都建议不要动。
详细原因可以查看文档 Debezium
Docker打包
Dockerfile 地址: Dockerfile
修改Dockerfile
# Stage 1: Build stage
FROM registry.access.redhat.com/ubi8/openjdk-21 AS builder
LABEL maintainer="Debezium Community"
#
# Set the version, home directory, and MD5 hash.
#
ENV DEBEZIUM_VERSION=3.0.4.Final \
SERVER_HOME=/debezium \
MAVEN_REPO_CENTRAL="https://repo1.maven.org/maven2"
ENV SERVER_URL_PATH=io/debezium/debezium-server-dist/$DEBEZIUM_VERSION/debezium-server-dist-$DEBEZIUM_VERSION.tar.gz \
SERVER_MD5=2a4c1f404707566077510ea516aa3e60
#
# Create a directory for Debezium Server
#
USER root
RUN microdnf -y install gzip && \
microdnf clean all && \
mkdir $SERVER_HOME && \
chmod 755 $SERVER_HOME
#
# Change ownership and switch user
#
RUN chown -R 1000 $SERVER_HOME && \
chgrp -R 1000 $SERVER_HOME
USER 1000
RUN mkdir $SERVER_HOME/conf && \
mkdir $SERVER_HOME/data
#
# Download and install Debezium Server
#
RUN curl -fSL -o /tmp/debezium.tar.gz "$MAVEN_REPO_CENTRAL/$SERVER_URL_PATH"
#
# Verify the contents and then install ...
#
RUN echo "$SERVER_MD5 /tmp/debezium.tar.gz" | md5sum -c - &&\
tar xzf /tmp/debezium.tar.gz -C $SERVER_HOME --strip-components 1 &&\
rm -f /tmp/debezium.tar.gz
#
# Allow random UID to use Debezium Server
#
RUN chmod -R g+w,o+w $SERVER_HOME
# Stage 2: Final image
FROM registry.access.redhat.com/ubi8/openjdk-21
LABEL maintainer="Debezium Community"
ENV DEBEZIUM_VERSION=3.0.4.Final \
SERVER_HOME=/debezium \
MAVEN_REPO_CENTRAL="https://repo1.maven.org/maven2"
USER root
RUN microdnf clean all
USER 1000
COPY --from=builder $SERVER_HOME $SERVER_HOME
# Set the working directory to the Debezium Server home directory
WORKDIR $SERVER_HOME
ENV GOOGLE_APPLICATION_CREDENTIALS=/tmp/credentials/service-account.json
COPY ./your/path/to/service-account.json /tmp/credentials/service-account.json
#
# Expose the ports and set up volumes for the data, transaction log, and configuration
#
EXPOSE 8080
VOLUME ["/debezium/conf","/debezium/data"]
CMD ["/debezium/run.sh"]
K8S部署
apiVersion: apps/v1
kind: Deployment
metadata:
name: debezium-mysql-stg
namespace: fpda-debezium-container
labels:
fpda-egress: all
fpda-ingress: all
spec:
replicas: 1
selector:
matchLabels:
app: debezium-mysql-stg
template:
metadata:
labels:
app: debezium-mysql-stg
fpda-egress: all
fpda-ingress: all
annotations:
config-version: "v1"
spec:
containers:
- name: debezium-mysql-stg
image: 'debezium:3.0.4'
securityContext:
runAsUser: 1000
ports:
- containerPort: 8080
resources:
requests:
cpu: '4'
memory: 8Gi
limits:
cpu: '8'
memory: 16Gi
volumeMounts:
- name: debezium-config
mountPath: /debezium/config
volumes:
- name: debezium-config
configMap:
name: yourname
items:
- key: application.properties
path: application.properties