AWS->K8S(Debezium)->BigQuery高可用实践

大纲

项目需求

把AWS上的表实时同步到GCP的BigQuery里,要求同步时间延迟小于1s,同时实现高可用。

架构

设计思路

1. AWS设计读写分离,Debezium连接从库,实现不影响主库性能的作用。

2. Debezium部署在K8S上,单节点部署,由K8S监测Debezium的docker运行状态

3. Debezium的offset和schema history 都存在Redis的Cluster里实现高可用,Redis保证Debezium
一旦出现故障也不会丢失数据同步信息。

4.利用GCP可以直接把Pub/Sub的message写入BigQuery的功能实现实时数据同步

5. 使用Debezium的增量快照功能最大利用节点资源

AWS配置

从库parameter_group配置

主库和从库需要进行一定配置来保证从库的binlog可以被debezium获取

Ref : Debezium Official Doc

从库配置:

gtid_mode=ON
enforce_gtid_consistency=ON
interactive_timeout=800000
wait_timeout=800000
binlog_rows_query_log_events=ON
binlog_annotate_row_events=ON
binlog_row_value_options="";
log_bin= mysql-bin
binlog_format= ROW
binlog_row_image=FULL
binlog_expire_logs_seconds=864000

注意主库和从库的有关binlog format的兼容性问题

https://debezium.io/documentation/reference/stable/connectors/mysql.html#enable-mysql-binlog

从库的binlog_format=ROW (必须)

主库的binlog_format=MIXED 或者ROW

从库User设置

CREATE USER 'YOUR_NAME'@'localhost' IDENTIFIED BY 'XXXXX';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'YOUR_NAME' IDENTIFIED BY 'XXXXX';
FLUSH PRIVILEGES;

主库创建Debezium signal表(增量快照准备工作)

CREATE TABLE your_database.debezium_signal (
    id VARCHAR(42) PRIMARY KEY,
    type VARCHAR(32) NOT NULL,
    data VARCHAR(2048)
);

GCP配置

BigQuery表配置

这里创建的是Pub/Sub直接写入的表

CREATE TABLE IF NOT EXISTS
  `bq_database.your_table_name` ( 
    primary_key STRING,
    ....
    __op STRING,
    __table STRING,
    __source_ts_ms NUMERIC,
    __deleted STRING );

DELETE FROM `bq_database.your_table_name` WHERE 1=1;

CREATE OR REPLACE VIEW
  `bq_view_database.your_view_name` AS (
  SELECT
    primary_key,
    ....
  FROM (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY primary_key ORDER BY __source_ts_ms DESC) AS row_num
    FROM
      `bq_database.your_table_name`)
  WHERE
    row_num = 1
    AND __op <> 'd');

这里创建的view会保证和AWS的表数据完全一致

PubSub 配置

1. Schema Topic (只需要一个topic就能catch全部table的schema信息)

Topic ID : project_id

(GCP的project id)

这个可以按需加订阅

2. Table Topic (一个table一个主题)

project_id.mysql_database_name.mysql_table_name

准备service_account.json

IAM->Service Account->Admin->Key->Add Keys

下载json文件

Debezium配置

准备conf文件

apiVersion: v1
kind: ConfigMap
metadata:
  name: debezium-mysql-wheel-fdap-stg-config
  namespace: fpda-debezium-container
data:
  application.properties: |
    debezium.sink.type=pubsub
    debezium.sink.pubsub.project.id=your_project_id
    debezium.source.connector.class=io.debezium.connector.mysql.MySqlConnector
    debezium.source.offset.flush.interval.ms=0
    debezium.source.offset.flush.timeout.ms=180000
    debezium.source.offset.storage=io.debezium.storage.redis.offset.RedisOffsetBackingStore
    debezium.source.offset.storage.redis.address=your_redis_ip:your_redis_port
    debezium.source.offset.storage.redis.password=your_redis_password
    debezium.source.offset.storage.redis.key=metadata:debezium:offsets_mysql_wheel_stg
    debezium.source.schema.history.internal=io.debezium.storage.redis.history.RedisSchemaHistory
    debezium.source.schema.history.internal.redis.address=your_redis_ip:your_redis_port
    debezium.source.schema.history.internal.redis.password=your_redis_password
    debezium.source.schema.history.internal.redis.key=metadata:debezium:schema_history_mysql_wheel_stg
    debezium.source.decimal.handling.mode=double
    debezium.source.database.history=io.debezium.relational.history.FileDatabaseHistory
    debezium.source.database.history.file.filename=tmp/history.dat
    debezium.source.database.hostname=your_mysql_ip
    debezium.source.database.port=3306
    debezium.source.database.user=your_mysql_user
    debezium.source.database.password=your_mysql_password
    debezium.source.database.include.list=your_database_name
    debezium.source.database.server.id= your_server_id
    debezium.source.topic.prefix=your_gcp_project_id
    debezium.source.snapshot.include.collection.list=your_mysql_database.debezium_signal
    debezium.source.signal.data.collection=your_mysql_database.debezium_signal
    debezium.source.read.only=true
    debezium.source.snapshot.mode=initial
    debezium.source.snapshot.locking.mode=none
    debezium.source.transforms=unwrap
    debezium.source.transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
    debezium.source.transforms.unwrap.add.fields=op,table,source.ts_ms
    debezium.source.transforms.unwrap.delete.handling.mode=rewrite
    debezium.source.key.converter.schemas.enable=false
    debezium.source.value.converter.schemas.enable=false
    debezium.incremental.snapshot.chunk.size=409600

注意:

1.Redis至少有两个master节点,一个写入schema信息,一个写入offset信息。需要判断key对应哪个节点然后hardcode写入ip地址

2. debezium.source.snapshot.include.collection=your_mysql_database.debezium_signal

保证在debezium初始化的时候只会同步debezium_signal这一张表。这样保证我们将来想要哪张表的加入实时同步传输的时候发送signal就可以了

INSERT INTO your_mysql_database.debezium_signal(id, type, data) VALUES ('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["your_mysql_database.order_status"], "type":"incremental"}');

3. 其他的配置都建议不要动。

详细原因可以查看文档 Debezium

Docker打包

Dockerfile 地址: Dockerfile

修改Dockerfile

# Stage 1: Build stage
FROM registry.access.redhat.com/ubi8/openjdk-21 AS builder

LABEL maintainer="Debezium Community"

#
# Set the version, home directory, and MD5 hash.
#
ENV DEBEZIUM_VERSION=3.0.4.Final \
    SERVER_HOME=/debezium \
    MAVEN_REPO_CENTRAL="https://repo1.maven.org/maven2"
ENV SERVER_URL_PATH=io/debezium/debezium-server-dist/$DEBEZIUM_VERSION/debezium-server-dist-$DEBEZIUM_VERSION.tar.gz \
    SERVER_MD5=2a4c1f404707566077510ea516aa3e60

#
# Create a directory for Debezium Server
#
USER root
RUN microdnf -y install gzip && \
    microdnf clean all && \
    mkdir $SERVER_HOME && \
    chmod 755 $SERVER_HOME

#
# Change ownership and switch user
#
RUN chown -R 1000 $SERVER_HOME && \
    chgrp -R 1000 $SERVER_HOME
USER 1000

RUN mkdir $SERVER_HOME/conf && \
    mkdir $SERVER_HOME/data

#
# Download and install Debezium Server
#
RUN curl -fSL -o /tmp/debezium.tar.gz "$MAVEN_REPO_CENTRAL/$SERVER_URL_PATH"

#
# Verify the contents and then install ...
#
RUN echo "$SERVER_MD5 /tmp/debezium.tar.gz" | md5sum -c - &&\
    tar xzf /tmp/debezium.tar.gz -C $SERVER_HOME --strip-components 1 &&\
    rm -f /tmp/debezium.tar.gz

#
# Allow random UID to use Debezium Server
#
RUN chmod -R g+w,o+w $SERVER_HOME

# Stage 2: Final image
FROM registry.access.redhat.com/ubi8/openjdk-21

LABEL maintainer="Debezium Community"

ENV DEBEZIUM_VERSION=3.0.4.Final \
    SERVER_HOME=/debezium \
    MAVEN_REPO_CENTRAL="https://repo1.maven.org/maven2"

USER root
RUN microdnf clean all

USER 1000

COPY --from=builder $SERVER_HOME $SERVER_HOME

# Set the working directory to the Debezium Server home directory
WORKDIR $SERVER_HOME

ENV GOOGLE_APPLICATION_CREDENTIALS=/tmp/credentials/service-account.json

COPY ./your/path/to/service-account.json /tmp/credentials/service-account.json


#
# Expose the ports and set up volumes for the data, transaction log, and configuration
#
EXPOSE 8080
VOLUME ["/debezium/conf","/debezium/data"]

CMD ["/debezium/run.sh"]

K8S部署

apiVersion: apps/v1
kind: Deployment
metadata:
  name: debezium-mysql-stg
  namespace: fpda-debezium-container
  labels:
    fpda-egress: all
    fpda-ingress: all
spec:
  replicas: 1
  selector:
    matchLabels:
      app: debezium-mysql-stg
  template:
    metadata:
      labels:
        app: debezium-mysql-stg
        fpda-egress: all
        fpda-ingress: all
      annotations:
        config-version: "v1"
    spec:
      containers:
        - name: debezium-mysql-stg
          image: 'debezium:3.0.4'
          securityContext:
            runAsUser: 1000
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: '4'
              memory: 8Gi
            limits:
              cpu: '8'
              memory: 16Gi
          volumeMounts:
            - name: debezium-config
              mountPath: /debezium/config
      volumes:
        - name: debezium-config
          configMap:
            name: yourname
            items:
              - key: application.properties
                path: application.properties

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值