Apache Pulsar IO之CDC Debezium Connector (SQLServer)

本文介绍如何通过Apache Pulsar与Debezium集成实现SQL Server变更数据捕获(CDC),包括安装配置Pulsar、创建NAR文件、配置连接器及启动源等步骤。

启动Pulsar

  1. 从官网下载tar.gz安装包

下载地址:https://archive.apache.org/dist/pulsar/pulsar-2.4.0/apache-pulsar-2.4.0-bin.tar.gz

$ wget https://archive.apache.org/dist/pulsar/pulsar-2.4.0/apache-pulsar-2.4.0-bin.tar.gz
  1. 解压
$ tar zxvf apache-pulsar-2.4.0-bin.tar.gz
  1. 创建connectors文件夹
$ cd apache-pulsar-2.4.0
$ mkdir connectors
  1. 启动pulsar
$ bin/pulsar standalone

OR

$ bin/pulsar standalone -nss

创建pulsar-io-debezium-sqlserver-1.0.nar文件

创建maven项目

添加依赖

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.apache.pulsar</groupId>
        <artifactId>pulsar-io</artifactId>
        <version>2.4.0</version>
    </parent>

    <artifactId>pulsar-sqlserver</artifactId>
    <name>Pulsar</name>

    <dependencies>
        <dependency>
            <groupId>${project.groupId}</groupId>
            <artifactId>pulsar-io-debezium-core</artifactId>
            <version>${project.version}</version>
        </dependency>

        <dependency>
            <groupId>io.debezium</groupId>
            <artifactId>debezium-connector-sqlserver</artifactId>
            <version>${debezium.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.nifi</groupId>
                <artifactId>nifi-nar-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

编写DebeziumSqlServerSource类

public class DebeziumSqlServerSource extends DebeziumSource {

    static private final String DEFAULT_TASK = "io.debezium.connector.sqlserver.SqlServerConnectorTask";


    public void setDbConnectorTask(Map<String, Object> config) throws Exception {
        throwExceptionIfConfigNotMatch(config, TaskConfig.TASK_CLASS_CONFIG, DEFAULT_TASK);
    }
}

编写debezium-sqlserver-source-config.yaml文件

  • 文件内容
tenant: "test"
namespace: "test-namespace"
name: "debezium-sqlserver-source"
topicName: "debezium-sqlserver-topic"
archive: "connectors/pulsar-io-debezium-sqlserver-1.0-SNAPSHOT.nar"

parallelism: 1

configs:
  database.hostname: "localhost"
  database.port: "1433"
  database.user: "sa"
  database.password: "Password!"
  database.dbname: "testDB"
  "database.server.name": "fullfillment"
  table.whitelist: "dbo.customers"

  pulsar.service.url: "pulsar://127.0.0.1:6650"
  database.history.pulsar.topic: "sqlserver-history-topic"
  offset.storage.topic: "sqlserver-offset-topic"
  • 文件存放路径
    在这里插入图片描述

编写pulsar-io.yaml文件

  • 文件内容
name: debezium-sqlserver
description: Debezium SqlServer Source
sourceClass: com.pulsar.io.debezium.sqlserver.DebeziumSqlServerSource
  • 文件存放路径
    在这里插入图片描述

创建pulsar-io-debezium-sqlserver-1.0.nar文件

  • 输入maven打包命令:
mvn clean install

文件打包完成后存放在target目录下。

上传文件至服务器

使用远程连接工具(Xftp或WinSCP)将打包好的nar文件上传至Apache Pulsar安装目录下的connectors目录下。

启动Debezium Connector

编写debezium-sqlserver-source-config.yaml文件

  • 文件内容
tenant: "public"
namespace: "default"
name: "debezium-sqlserver-source"
topicName: "debezium-sqlserver-topic"
archive: "connectors/pulsar-io-debezium-sqlserver-1.0-SNAPSHOT.nar" # nar文件路径

parallelism: 1

configs:
  database.hostname: "127.0.0.1" # 数据库地址
  database.port: "1433" # 数据库端口
  database.user: "username" # 用户名
  database.password: "password" # 密码
  database.server.name: "servername" # 服务名称,可以任意命名
  database.dbname: "dbname" # 需要捕获的数据库名城
  table.whitelist: "dbo.cm_logs" # 需要捕获的表名称,schemaName.tableName

  database.history: "org.apache.pulsar.io.debezium.PulsarDatabaseHistory"
  database.history.pulsar.topic: "mssqlserver6-history-topic"
  database.history.pulsar.service.url: "pulsar://127.0.0.1:6650"
  ## KEY_CONVERTER_CLASS_CONFIG, VALUE_CONVERTER_CLASS_CONFIG
  key.converter: "org.apache.kafka.connect.json.JsonConverter"
  value.converter: "org.apache.kafka.connect.json.JsonConverter"
  ## PULSAR_SERVICE_URL_CONFIG
  pulsar.service.url: "pulsar://127.0.0.1:6650"
  ## OFFSET_STORAGE_TOPIC_CONFIG
  offset.storage.topic: "offset-topic"

configs详细配置信息请参考:Debezium Connector for SQL Server

  • 文件存放路劲
    在这里插入图片描述

启动SQLServer Source 并订阅Topic

启动SQLServer Source

在服务器中输入:

$ bin/pulsar-admin source localrun  --source-config-file debezium-mysql-source-config.yaml

订阅Topic

PulsarClient client = PulsarClient.builder().serviceUrl("pulsar://127.0.0.1:6650").build();
        Consumer consumer = client.newConsumer(Schema.KeyValue(Schema.BYTES, Schema.BYTES))
                .topic("public/default/bigdata.dbo.cm_logs")
                .subscriptionName("sub-products")
                .subscribe();
        while (true) {
            try {
                Message msg = consumer.receive();
                KeyValue<byte[], byte[]> keyValues =  Schema.KeyValue(Schema.BYTES, Schema.BYTES).decode(msg.getData());
                JSONSchema<Map> jsonSchema = JSONSchema.of(SchemaDefinition.<Map>builder().withPojo(Map.class).build());
                Map keyResult = jsonSchema.decode(keyValues.getKey());
                Map valueResult = jsonSchema.decode(keyValues.getValue());
                // TODO("get data")
                System.out.println(keyResult);
                System.out.println(valueResult);
                consumer.acknowledge(msg);
            } catch (Exception e) {
                consumer.close();
                e.printStackTrace();
            }
        }

问题回顾

  • 问题一
    在这里插入图片描述
    解决方法
    开启SQLServer的CDC。(2008及之后的版本需要企业版)
    SQL代码如下:
USE [databaseName]
GO
EXEC sys.sp_cdc_enable_db
GO
-- 以下操作为需要捕获哪张表的数据就将表添加到CDC的作业任务中
USE [databaseName]
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'tableName',
@role_name = N'roleName'
GO

详细配置信息请参考:sys.sp_cdc_enable_table (Transact-SQL)

参考文档

Apache Pulsar:http://pulsar.apache.org/docs/en/io-cdc-debezium/
Apache Pulsar Git: https://github.com/apache/pulsar/tree/master/pulsar-io/debezium/mysql
debezium:https://debezium.io/docs/connectors/sqlserver/#setting-up-sqlserver
Microsoft:https://docs.microsoft.com/zh-cn/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-table-transact-sql?view=sql-server-2017

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值