docker 里跑 pulsar Connector

本文详细介绍了如何在Pulsar与Cassandra之间建立数据流,包括环境搭建、sink创建及验证过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文以pulsar 官方给的例子为基础进行说明。

Cassandra 环境准备

1.安装Cassandra

docker pull cassandra

2.启动Cassandra

docker run -d --rm --name=cassandra -p 9042:9042 cassandra

3.用cqlsh连接Cassandra 创建keyspace和table

docker exec -ti cassandra cqlsh localhost
Connected to Test Cluster at localhost:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

创建keyspace pulsar_test_keyspace

CREATE KEYSPACE pulsar_test_keyspace WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};

创建tablepulsar_test_table

USE pulsar_test_keyspace;
CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);

pulsar环境准备

1.安装pulsar

docker pull apachepulsar/pulsar:2.4.2

也可以通过apachepulsar/pulsar-all:2.4.2 后者包含pulsar所有的东西占用空间较大,笔者使用的是前者.

2.启动pulsar

docker run -it -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:2.4.2 bin/pulsar standalone

由于笔者使用的不是all版,所以需要单独下载对应的nar包.接下来进入容器将下载好的nar包放入pulsar中.

docker exec -it 容器id bash

pulsar 默认的connector目录为connectors.

  • 创建目录
mkdir connectors
  • 将下载好的nar包复制到该目录下,复制好后重启pulsar容器
docker cp e:/pulsar/pulsar-io-cassandra-2.4.2.nar 容器id:/pulsar/connectors
  • 创建cassandra-sink.yml加入以下内容,特别注意由于是docker环境不要写localhost和127.0.0.1,写宿主机的ip
configs:
    roots: "192.168.1.58:9042"
    keyspace: "pulsar_test_keyspace"
    columnFamily: "pulsar_test_table"
    keyname: "key"
    columnName: "col"
  • 将该文件拷贝到pulsar容器中
docker cp e:/pulsar/examples/cassandra-sink.yml 容器id:/pulsar/examples

创建Cassandra sink

完成上述步骤后开始创建sink.

  • 创建sink
./bin/pulsar-admin sinks create 
    --tenant public 
    --namespace default 
    --name cassandra-test-sink 
    --sink-type cassandra 
    --sink-config-file examples/cassandra-sink.yml 
    --inputs test_cassandra
  • 验证sink
./bin/pulsar-admin sinks status 
   --tenant public 
   --namespace default 
   --name cassandra-test-sink

输出如下:

{
		"numInstances" : 1,
		"numRunning" : 1,
		"instances" : [ {
			 "instanceId" : 0,
			 "status" : {
			 	"running" : true,
			 	"error" : "",
			 	"numRestarts" : 0,
			 	"numReadFromPulsar" : 0,
			 	"numSystemExceptions" : 0,
			 	"latestSystemExceptions" : [ ],
			 	"numSinkExceptions" : 0,
			 	"latestSinkExceptions" : [ ],
			 	"numWrittenToSink" : 0,
			 	"lastReceivedTime" : 0,
			 	"workerId" : "c-standalone-fw-localhost-8080"
			 }
		}]
	}

注意检查running值是否为true,若为出则证明sink创建成功,若为false可通过查看/pulsar/logs/functions/public/default/cassandra-test-sink/cassandra-test-sink-0.log失败原因.

发送消息验证Cassandra

for i in {0..9}; do bin/pulsar-client produce -m "key-$i" -n 1 test_cassandra; done
  • 查看sink状态
./bin/pulsar-admin sinks status 
   --tenant public 
   --namespace default 
   --name cassandra-test-sink

输出如下:

{
  "numInstances" : 1,
  "numRunning" : 1,
  "instances" : [ {
    "instanceId" : 0,
    "status" : {
      "running" : true,
      "error" : "",
      "numRestarts" : 0,
      "numReadFromPulsar" : 10,
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "numSinkExceptions" : 0,
      "latestSinkExceptions" : [ ],
      "numWrittenToSink" : 10,
      "lastReceivedTime" : 1551685489136,
      "workerId" : "c-standalone-fw-localhost-8080"
    }
  } ]
}

可以通过numReadFromPulsar属性看出已经成功读取了10条消息.

  • 验证Cassandra
docker exec -ti cassandra cqlsh localhost

查询表pulsar_test_table记录

cqlsh> use pulsar_test_keyspace;
cqlsh:pulsar_test_keyspace> select * from pulsar_test_table;

 key    | col
--------+--------
  key-5 |  key-5
  key-0 |  key-0
  key-9 |  key-9
  key-2 |  key-2
  key-1 |  key-1
  key-3 |  key-3
  key-6 |  key-6
  key-7 |  key-7
  key-4 |  key-4
  key-8 |  key-8

看到这个信息证明sink跑通了.

说明

  • 关于创建sink过程中--sink-type参数的说明.
    官网中介绍是The sink's connector provider.
    通过查看源码得知,此处填写的是 connector的name属性.目前pulsar提供的sink如下:
  • cassandra
@Connector(
    name = "cassandra",
    type = IOType.SINK,
    help = "The CassandraStringSink is used for moving messages from Pulsar to Cassandra.",
    configClass = CassandraSinkConfig.class)
  • kafka
@Connector(
    name = "kafka",
    type = IOType.SINK,
    help = "The KafkaBytesSink is used for moving messages from Pulsar to Kafka.",
    configClass = KafkaSinkConfig.class
)
  • redis
@Connector(
    name = "redis",
    type = IOType.SINK,
    help = "A sink connector is used for moving messages from Pulsar to Redis.",
    configClass = RedisSinkConfig.class
)
  • RabbitMQ
@Connector(
    name = "rabbitmq",
    type = IOType.SINK,
    help = "A sink connector is used for moving messages from Pulsar to RabbitMQ.",
    configClass = RabbitMQSinkConfig.class
)
  • solr
@Connector(
    name = "solr",
    type = IOType.SINK,
    help = "The SolrGenericRecordSink is used for moving messages from Pulsar to Solr.",
    configClass = SolrSinkConfig.class
)
  • mongodb
@Connector(
    name = "mongo",
    type = IOType.SINK,
    help = "A sink connector that sends pulsar messages to mongodb",
    configClass = MongoConfig.class
)
  • Hbase
@Connector(
    name = "hbase",
    type = IOType.SINK,
    help = "The HbaseGenericRecordSink is used for moving messages from Pulsar to Hbase.",
    configClass = HbaseSinkConfig.class
)
  • ElasticSearch
@Connector(
    name = "elastic_search",
    type = IOType.SINK,
    help = "A sink connector that sends pulsar messages to elastic search",
    configClass = ElasticSearchConfig.class
)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值