本文以pulsar 官方给的例子为基础进行说明。
Cassandra 环境准备
1.安装Cassandra
docker pull cassandra
2.启动Cassandra
docker run -d --rm --name=cassandra -p 9042:9042 cassandra
3.用cqlsh
连接Cassandra 创建keyspace和table
docker exec -ti cassandra cqlsh localhost
Connected to Test Cluster at localhost:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
创建keyspace pulsar_test_keyspace
CREATE KEYSPACE pulsar_test_keyspace WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
创建tablepulsar_test_table
USE pulsar_test_keyspace;
CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);
pulsar环境准备
1.安装pulsar
docker pull apachepulsar/pulsar:2.4.2
也可以通过apachepulsar/pulsar-all:2.4.2
后者包含pulsar所有的东西占用空间较大,笔者使用的是前者.
2.启动pulsar
docker run -it -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:2.4.2 bin/pulsar standalone
由于笔者使用的不是all版,所以需要单独下载对应的nar包.接下来进入容器将下载好的nar包放入pulsar中.
docker exec -it 容器id bash
pulsar 默认的connector目录为connectors.
- 创建目录
mkdir connectors
- 将下载好的nar包复制到该目录下,复制好后重启pulsar容器。
docker cp e:/pulsar/pulsar-io-cassandra-2.4.2.nar 容器id:/pulsar/connectors
- 创建cassandra-sink.yml加入以下内容,特别注意由于是docker环境不要写localhost和127.0.0.1,写宿主机的ip
configs:
roots: "192.168.1.58:9042"
keyspace: "pulsar_test_keyspace"
columnFamily: "pulsar_test_table"
keyname: "key"
columnName: "col"
- 将该文件拷贝到pulsar容器中
docker cp e:/pulsar/examples/cassandra-sink.yml 容器id:/pulsar/examples
创建Cassandra sink
完成上述步骤后开始创建sink.
- 创建sink
./bin/pulsar-admin sinks create
--tenant public
--namespace default
--name cassandra-test-sink
--sink-type cassandra
--sink-config-file examples/cassandra-sink.yml
--inputs test_cassandra
- 验证sink
./bin/pulsar-admin sinks status
--tenant public
--namespace default
--name cassandra-test-sink
输出如下:
{
"numInstances" : 1,
"numRunning" : 1,
"instances" : [ {
"instanceId" : 0,
"status" : {
"running" : true,
"error" : "",
"numRestarts" : 0,
"numReadFromPulsar" : 0,
"numSystemExceptions" : 0,
"latestSystemExceptions" : [ ],
"numSinkExceptions" : 0,
"latestSinkExceptions" : [ ],
"numWrittenToSink" : 0,
"lastReceivedTime" : 0,
"workerId" : "c-standalone-fw-localhost-8080"
}
}]
}
注意检查running
值是否为true,若为出则证明sink创建成功,若为false可通过查看/pulsar/logs/functions/public/default/cassandra-test-sink/cassandra-test-sink-0.log
失败原因.
发送消息验证Cassandra
for i in {0..9}; do bin/pulsar-client produce -m "key-$i" -n 1 test_cassandra; done
- 查看sink状态
./bin/pulsar-admin sinks status
--tenant public
--namespace default
--name cassandra-test-sink
输出如下:
{
"numInstances" : 1,
"numRunning" : 1,
"instances" : [ {
"instanceId" : 0,
"status" : {
"running" : true,
"error" : "",
"numRestarts" : 0,
"numReadFromPulsar" : 10,
"numSystemExceptions" : 0,
"latestSystemExceptions" : [ ],
"numSinkExceptions" : 0,
"latestSinkExceptions" : [ ],
"numWrittenToSink" : 10,
"lastReceivedTime" : 1551685489136,
"workerId" : "c-standalone-fw-localhost-8080"
}
} ]
}
可以通过numReadFromPulsar
属性看出已经成功读取了10条消息.
- 验证Cassandra
docker exec -ti cassandra cqlsh localhost
查询表pulsar_test_table
记录
cqlsh> use pulsar_test_keyspace;
cqlsh:pulsar_test_keyspace> select * from pulsar_test_table;
key | col
--------+--------
key-5 | key-5
key-0 | key-0
key-9 | key-9
key-2 | key-2
key-1 | key-1
key-3 | key-3
key-6 | key-6
key-7 | key-7
key-4 | key-4
key-8 | key-8
看到这个信息证明sink跑通了.
说明
- 关于创建sink过程中
--sink-type
参数的说明.
官网中介绍是The sink's connector provider
.
通过查看源码得知,此处填写的是 connector的name
属性.目前pulsar提供的sink如下: - cassandra
@Connector(
name = "cassandra",
type = IOType.SINK,
help = "The CassandraStringSink is used for moving messages from Pulsar to Cassandra.",
configClass = CassandraSinkConfig.class)
- kafka
@Connector(
name = "kafka",
type = IOType.SINK,
help = "The KafkaBytesSink is used for moving messages from Pulsar to Kafka.",
configClass = KafkaSinkConfig.class
)
- redis
@Connector(
name = "redis",
type = IOType.SINK,
help = "A sink connector is used for moving messages from Pulsar to Redis.",
configClass = RedisSinkConfig.class
)
- RabbitMQ
@Connector(
name = "rabbitmq",
type = IOType.SINK,
help = "A sink connector is used for moving messages from Pulsar to RabbitMQ.",
configClass = RabbitMQSinkConfig.class
)
- solr
@Connector(
name = "solr",
type = IOType.SINK,
help = "The SolrGenericRecordSink is used for moving messages from Pulsar to Solr.",
configClass = SolrSinkConfig.class
)
- mongodb
@Connector(
name = "mongo",
type = IOType.SINK,
help = "A sink connector that sends pulsar messages to mongodb",
configClass = MongoConfig.class
)
- Hbase
@Connector(
name = "hbase",
type = IOType.SINK,
help = "The HbaseGenericRecordSink is used for moving messages from Pulsar to Hbase.",
configClass = HbaseSinkConfig.class
)
- ElasticSearch
@Connector(
name = "elastic_search",
type = IOType.SINK,
help = "A sink connector that sends pulsar messages to elastic search",
configClass = ElasticSearchConfig.class
)