pykafka

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.youkuaiyun.com/q383965374/article/details/91454185
         <!--一个博主专栏付费入口结束-->
        <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-d284373521.css">
                                    <div id="content_views" class="markdown_views prism-atom-one-dark">
                <!-- flowchart 箭头图标 勿删 -->
                <svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
                    <path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path>
                </svg>
                                        <p>python连接kafka的标准库比较流行的有<br>

1、kafka-python

2、pykafka

kafka-python使用的人多是比较成熟的库,

pykafka是Samsa的升级版本,使用samsa连接zookeeper然后使用kafka Cluster。

区别:
pykafka的对zookeeper支持而kafka-python并没有zk的支持

kafka-python使用

操作文档

https://kafka-python.readthedocs.io/en/master/apidoc/modules.html

https://kafka-python.readthedocs.io/en/master/index.html

https://pypi.org/project/kafka-python/

生产者

import time
from kafka import KafkaProducer

producer = KafkaProducer(bootstrap_servers = [‘192.168.17.64:9092’, ‘192.168.17.65:9092’, ‘192.168.17.68:9092’])

Assign a topic

topic = ‘test’

def test():
print(‘begin’)
n = 1
try:
while (n<=100):
producer.send(topic, str(n).encode())
print(“send” + str(n))
n += 1
time.sleep(0.5)
except KafkaError as e:
print(e)
finally:
producer.close()
print(‘done’)

def test_json():
msg_dict = {
“sleep_time”: 10,
“db_config”: {
“database”: “test_1”,
“host”: “xxxx”,
“user”: “root”,
“password”: “root”
},
“table”: “msg”,
“msg”: “Hello World”
}
msg = json.dumps(msg_dict)
producer.send(topic, msg, partition=0)
producer.close()

if name == ‘main’:
test()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44

可能遇到的问题–IOError: [Errno 24] Too many open files–多次创建KafkaProducer

在每个controller函数中创建一个SimpleProducer。切换到KafkaProducer后,依然在每个controller中创建新的KafkaProducer。如下所示:

try:
   producer = KafkaProducer(bootstrap_servers=['{kafka_host}:{kafka_port}'.format(
       kafka_host=app.config['KAFKA_HOST'],
       kafka_port=app.config['KAFKA_PORT']
   )])
   message_string = json.dumps(message)
   response = producer.send(kafka_topic, message_string.encode('utf-8'))
   producer.close()

 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

原因是每次创建KafkaProducer都会占用一个文件符号,controller结束时,没有释放,导致后面无法继续创建新的KafkaProducer。

解决方法是创建全局KafkaProducer,供所有controller使用。

注意事项–慎用RecordMetadata.get()

官方例子中有如下的代码

producer = KafkaProducer(bootstrap_servers=['broker1:1234'])

Asynchronous by default

future = producer.send(‘my-topic’, b’raw_bytes’)

Block for ‘synchronous’ sends

try:
record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed…
log.exception()
pass

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

KafkaProducer.send 返回 RecordMetadata 对象,RecordMetadata.get 可以获取 record 的信息。但在发送大量消息时,get方法可能会造成明显的延时。所以当我们不关心消息是否发送成功时,就不要调用get方法了。

消费者

#!/bin/env python
from kafka import KafkaConsumer

#connect to Kafka server and pass the topic we want to consume
consumer = KafkaConsumer(‘test’, group_id = ‘test_group’, bootstrap_servers = [‘192.168.17.64:9092’, ‘192.168.17.65:9092’, ‘192.168.17.68:9092’])
try:
for msg in consumer:
print(msg)
print("%s:%d:%d: key=%s value=%s" % (msg.topic, msg.partition,msg.offset, msg.key, msg.value))
except KeyboardInterrupt, e:
print(e)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

设置不自动提交

自动提交位移设为flase, 默认为取最新的偏移量

consumer = kafka.KafkaConsumer(bootstrap_servers = ['192.168.17.64:9092','192.168.17.65:9092','192.168.17.68:9092'],
                        group_id ='test_group_id',
                        auto_offset_reset ='latest', 
                        enable_auto_commit = False)

 
  • 1
  • 2
  • 3
  • 4

批量发送数据

from kafka import KafkaClient
from kafka.producer import SimpleProducer

def send_data_2_kafka(datas):
‘’’
向kafka解析队列发送数据
‘’’
client = KafkaClient(hosts=KAFKABROKER.split(","), timeout=30)
producer = SimpleProducer(client, async=False)

curcount = len(datas)/PARTNUM
for i in range(0, PARTNUM):
start = i*curcount
if i != PARTNUM - 1:
end = (i+1)*curcount
curdata = datas[start:end]
producer.send_messages(TOPICNAME, *curdata)
else:
curdata = datas[start:]
producer.send_messages(TOPICNAME, *curdata)

producer.stop()
client.close()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

其中PARTNUM为topic的partition的数目,这样保证批量发送的数据均匀的落在kafka的partition中。

消费者订阅多个主题

# =======订阅多个消费者==========

from kafka import KafkaConsumer
from kafka.structs import TopicPartition

consumer = KafkaConsumer(bootstrap_servers=[‘127.0.0.1:9092’])
consumer.subscribe(topics=(‘test’,‘test0’)) #订阅要消费的主题
print(consumer.topics())
print(consumer.position(TopicPartition(topic=‘test’, partition=0))) #获取当前主题的最新偏移量
for message in consumer:
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,message.offset, message.key,message.value))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

消费者定时拉取

有时候,我们并不需要实时获取数据,因为这样可能会造成性能瓶颈,我们只需要定时去获取队列里的数据然后批量处理就可以,这种情况,我们可以选择主动拉取数据

from kafka import KafkaConsumer
import time

consumer = KafkaConsumer(group_id=‘123456’, bootstrap_servers=[‘10.43.35.25:4531’])
consumer.subscribe(topics=(‘test_rhj’,))
index = 0
while True:
msg = consumer.poll(timeout_ms=5) # 从kafka获取消息
print msg
time.sleep(2)
index += 1
print ‘--------poll index is %s----------’ % index

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

每次拉取到的都是前面生产的数据,可能是多条的列表,也可能没有数据,如果没有数据,则拉取到的为空。

消费者读取最早偏移量

from kafka import KafkaConsumer

consumer = KafkaConsumer(‘test’,auto_offset_reset=‘earliest’,bootstrap_servers=[‘127.0.0.1:9092’])

for message in consumer:
print("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,message.offset, message.key,message.value))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

auto_offset_reset:重置偏移量,earliest移到最早的可用消息,latest最新的消息,默认为latest
源码定义:{‘smallest’: ‘earliest’, ‘largest’: ‘latest’}

消费者手动设置偏移量

# ==========读取指定位置消息===============
from kafka import KafkaConsumer
from kafka.structs import TopicPartition

consumer = KafkaConsumer(‘test’,bootstrap_servers=[‘127.0.0.1:9092’])

print(consumer.partitions_for_topic(“test”)) #获取test主题的分区信息
print(consumer.topics()) #获取主题列表
print(consumer.subscription()) #获取当前消费者订阅的主题
print(consumer.assignment()) #获取当前消费者topic、分区信息
print(consumer.beginning_offsets(consumer.assignment())) #获取当前消费者可消费的偏移量
consumer.seek(TopicPartition(topic=‘test’, partition=0), 5) #重置偏移量,从第5个偏移量消费
for message in consumer:
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,message.offset, message.key,message.value))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

消费者挂起和恢复

# ==============消息恢复和挂起===========

from kafka import KafkaConsumer
from kafka.structs import TopicPartition
import time

consumer = KafkaConsumer(bootstrap_servers=[‘127.0.0.1:9092’])
consumer.subscribe(topics=(‘test’))
consumer.topics()
consumer.pause(TopicPartition(topic=u’test’, partition=0)) # pause执行后,consumer不能读取,直到调用resume后恢复。
num = 0
while True:
print(num)
print(consumer.paused()) #获取当前挂起的消费者
msg = consumer.poll(timeout_ms=5)
print(msg)
time.sleep(2)
num = num + 1
if num == 10:
print(“resume…”)
consumer.resume(TopicPartition(topic=‘test’, partition=0))
print(“resume…”)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

pykafka使用

操作文档

http://pykafka.readthedocs.io/en/latest/
https://github.com/Parsely/pykafka

需要注意的点

kafaka和zookeeper的群集,使用samsa的时候生产者和消费者都连接了zookeeper,但pykafka文档中生产者直接连接kafaka服务器列表,消费者才用zookeeper。

生产者

#coding=utf-8

import time
from pykafka import KafkaClient

class KafkaTest(object):
“”"
测试kafka常用api
“”"
def init(self, host=“192.168.0.10:9092”):
self.host = host
self.client = KafkaClient(hosts=self.host)

def producer_partition(self, topic):
    """
    生产者分区查看,主要查看生产消息时offset的变化
    :return:
    """
    topic = self.client.topics[topic.encode()]
    partitions = topic.partitions
    print (u"查看所有分区 {}".format(partitions))

    earliest_offset = topic.earliest_available_offsets()
    print(u"获取最早可用的offset {}".format(earliest_offset))

    # 生产消息之前看看offset
    last_offset = topic.latest_available_offsets()
    print(u"最近可用offset {}".format(last_offset))

    # 同步生产消息
    p = topic.get_producer(sync=True)
    p.produce(str(time.time()).encode())

    # 查看offset的变化
    last_offset = topic.latest_available_offsets()
    print(u"最近可用offset {}".format(last_offset))

def producer_designated_partition(self, topic):
    """
    往指定分区写消息,如果要控制打印到某个分区,
    需要在获取生产者的时候指定选区函数,
    并且在生产消息的时候额外指定一个key
    :return:
    """

    def assign_patition(pid, key):
        """
        指定特定分区, 这里测试写入第一个分区(id=0)
        :param pid: 为分区列表
        :param key:
        :return:
        """
        print("为消息分配partition {} {}".format(pid, key))
        return pid[0]

    topic = self.client.topics[topic.encode()]
    p = topic.get_producer(sync=True, partitioner=assign_patition)
    p.produce(str(time.time()).encode(), partition_key=b"partition_key_0")

def async_produce_message(self, topic):
    """
    异步生产消息,消息会被推到一个队列里面,
    另外一个线程会在队列中消息大小满足一个阈值(min_queued_messages)
    或到达一段时间(linger_ms)后统一发送,默认5s
    :return:
    """
    topic = self.client.topics[topic.encode()]
    last_offset = topic.latest_available_offsets()
    print("最近的偏移量 offset {}".format(last_offset))

    # 记录最初的偏移量
    old_offset = last_offset[0].offset[0]
    p = topic.get_producer(sync=False, partitioner=lambda pid, key: pid[0])
    p.produce(str(time.time()).encode())
    s_time = time.time()
    while True:
        last_offset = topic.latest_available_offsets()
        print("最近可用offset {}".format(last_offset))
        if last_offset[0].offset[0] != old_offset:
            e_time = time.time()
            print('cost time {}'.format(e_time-s_time))
            break
        time.sleep(1)

def get_produce_message_report(self, topic):
    """
    查看异步发送消报告,默认会等待5s后才能获得报告
    """
    topic = self.client.topics[topic.encode()]
    last_offset = topic.latest_available_offsets()
    print("最近的偏移量 offset {}".format(last_offset))
    p = topic.get_producer(sync=False, delivery_reports=True, partitioner=lambda pid, key: pid[0])
    p.produce(str(time.time()).encode())
    s_time = time.time()
    delivery_report = p.get_delivery_report()
    e_time = time.time()
    print ('等待{}s, 递交报告{}'.format(e_time-s_time, delivery_report))
    last_offset = topic.latest_available_offsets()
    print("最近的偏移量 offset {}".format(last_offset))

if name == ‘main’:
host = ‘192.168.0.10:9092,192.168.0.12:9092,192.168.0.13:9092’
kafka_ins = KafkaTest(host)
topic = ‘test’
# kafka_ins.producer_partition(topic)
# kafka_ins.producer_designated_partition(topic)
# kafka_ins.async_produce_message(topic)
kafka_ins.get_produce_message_report(topic)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110

可能出现的问题–dliver_report(包括同步)子进程阻塞

多进程使用pykafka共享一个client,会造成只有进程能够正常的写入数据,如果使用了dliver_report(包括同步),会导致子进程彻底阻塞掉不可用

可能出现的问题–Producer.produce accepts a bytes object as message

使用producer.produce发送数据出现故障,如下

#!/bin/env python
from pykafka import KafkaClient
host = '192.168.17.64:9092,192.168.17.65:9092,192.168.17.68:9092'
client = KafkaClient(hosts = host)
topic = client.topics["test"]
with topic.get_sync_producer() as producer:
   for i in range(100):
       producer.produce('test message ' + str(i ** 2))

 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

报错:

Traceback (most recent call last):
  File "TaxiKafkaProduce.py", line 15, in <module>
    producer.produce(('test message ' + str(i ** 2)))
  File "/root/anaconda3/lib/python3.6/site-packages/pykafka/producer.py", line 325, in produce
    "got '%s'", type(message))
TypeError: ("Producer.produce accepts a bytes object as message, but it got '%s'", <class 'str'>)

 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

因为kafka传递的是字节,不是字符串,因此在传递字符串处encode()即可,分别是client.topics和producer.produce(),如下:

#!/bin/env python
from pykafka import KafkaClient
host = '192.168.17.64:9092,192.168.17.65:9092,192.168.17.68:9092'
client = KafkaClient(hosts = host)
topic = client.topics["test".encode()]
# 将产生kafka同步消息,这个调用仅仅在我们已经确认消息已经发送到集群之后
with topic.get_sync_producer() as producer:
    for i in range(100):
        producer.produce(('test message ' + str(i ** 2)).encode())

 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

同步与异步

from pykafka import KafkaClient
#可接受多个client
client = KafkaClient(hosts ="192.168.17.64:9092,192.168.17.65:9092,192.168.17.68:9092") 
#查看所有的topic
client.topics
print client.topics

topic = client.topics[‘test_kafka_topic’]#选择一个topic

message = “test message test message”

当有了topic之后呢,可以创建一个producer,来发消息,生产kafka数据,通过字符串形式,

with topic.get_sync_producer() as producer:
producer.produce(message)

以上的例子将产生kafka同步消息,这个调用仅仅在我们已经确认消息已经发送到集群之后

#但生产环境,为了达到高吞吐量,要采用异步的方式,通过delivery_reports =True来启用队列接口;
producer = topic.get_producer(sync=False, delivery_reports=True)
producer.produce(message)
try:
msg, exc = producer.get_delivery_report(block=False)
if exc is not None:
print ‘Failed to deliver msg {}: {}’.format(msg.partition_key, repr(exc))
else:
print ‘Successfully delivered msg {}’.format(msg.partition_key)
except Queue.Empty:
pass

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

消费者

pykafka消费者分为simple和balanced两种

simple适用于需要消费指定分区且不需要自动的重分配(自定义)
balanced自动分配则选择

#coding=utf-8

from pykafka import KafkaClient

class KafkaTest(object):
def init(self, host=“192.168.237.129:9092”):
self.host = host
self.client = KafkaClient(hosts=self.host)

def simple_consumer(self, topic, offset=0):
    """
    消费者指定消费
    :param offset:
    :return:
    """

    topic = self.client.topics[topic.encode()]
    partitions = topic.partitions
    last_offset = topic.latest_available_offsets()
    print("最近可用offset {}".format(last_offset))  # 查看所有分区
    consumer = topic.get_simple_consumer(b"simple_consumer_group", partitions=[partitions[0]])  # 选择一个分区进行消费
    offset_list = consumer.held_offsets
    print("当前消费者分区offset情况{}".format(offset_list))  # 消费者拥有的分区offset的情况
    consumer.reset_offsets([(partitions[0], offset)])  # 设置offset
    msg = consumer.consume()
    print("消费 :{}".format(msg.value.decode()))
    msg = consumer.consume()
    print("消费 :{}".format(msg.value.decode()))
    msg = consumer.consume()
    print("消费 :{}".format(msg.value.decode()))
    offset = consumer.held_offsets
    print("当前消费者分区offset情况{}".format(offset)) # 3

def balance_consumer(self, topic, offset=0):
    """
    使用balance consumer去消费kafka
    :return:
    """
    topic = self.client.topics["kafka_test".encode()]
    # managed=True 设置后,使用新式reblance分区方法,不需要使用zk,而False是通过zk来实现reblance的需要使用zk
    consumer = topic.get_balanced_consumer(b"consumer_group_balanced2", managed=True)
    partitions = topic.partitions
    print("分区 {}".format(partitions))
    earliest_offsets = topic.earliest_available_offsets()
    print("最早可用offset {}".format(earliest_offsets))
    last_offsets = topic.latest_available_offsets()
    print("最近可用offset {}".format(last_offsets))
    offset = consumer.held_offsets
    print("当前消费者分区offset情况{}".format(offset))
    while True:
        msg = consumer.consume()
        offset = consumer.held_offsets
        print("{}, 当前消费者分区offset情况{}".format(msg.value.decode(), offset))

if name == ‘main’:
host = ‘192.168.17.64:9092,192.168.17.65:9092,192.168.17.68:9092’
kafka_ins = KafkaTest(host)
topic = ‘test’
# kafka_ins.simple_consumer(topic)
kafka_ins.balance_consumer(topic)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61

连接zookeeper

>>>> balanced_consumer = topic.get_balanced_consumer(
 consumer_group='testgroup',
 auto_commit_enable=True,  # 设置为Flase的时候不需要添加 consumer_group
 zookeeper_connect='myZkClusterNode1.com:2181,myZkClusterNode2.com:2181/myZkChroot' # 这里就是连接多个zk
)

 
  • 1
  • 2
  • 3
  • 4
  • 5

使用consumber_group和consumer_id

# -* coding:utf8 *-  
from pykafka import KafkaClient

host = ‘192.168.17.64:9092,192.168.17.65:9092,192.168.17.68:9092’
client = KafkaClient(hosts = host)

print(client.topics)

消费者

topic = client.topics[‘test’.encode()]
consumer = topic.get_simple_consumer(consumer_group=‘test_group’,
# 设置为False的时候不需要添加consumer_group,直接连接topic即可取到消息
auto_commit_enable=True,
auto_commit_interval_ms=1,
#这里就是连接多个zk
zookeeper_connect=‘192.168.17.64:2181,192.168.17.65:2181,192.168.17.68:2181’
consumer_id=‘test_id’)

for message in consumer:
if message is not None:
#打印接收到的消息体的偏移个数和值
print(message.offset, message.value)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

可能遇到的问题–AttributeError: ‘SimpleConsumer’ object has no attribute ‘_consumer_group’

因为kafka在传输的时候需要bytes,而不是str,所以在str上加上b标识就可以,如下:

# -* coding:utf8 *-  
from pykafka import KafkaClient

host = ‘192.168.17.64:9092,192.168.17.65:9092,192.168.17.68:9092’
client = KafkaClient(hosts = host)

print(client.topics)

消费者

topic = client.topics[‘test’.encode()]
consumer = topic.get_simple_consumer(consumer_group=b’test_group’, auto_commit_enable=True, auto_commit_interval_ms=1, consumer_id=b’test_id’)

for message in consumer:
if message is not None:
print(message.offset, message.value.decode(‘utf-8’))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

不要重复消费,对已经消费过的信息进行舍弃

不希望消费历史数据的时候,需要使用auto_commit_enable这个参数

 consumer = topic.get_simple_consumer(consumer_group=b'test_group', 
                             auto_commit_enable=True, 
                             auto_commit_interval_ms=1, 
                             consumer_id=b'test_id')

 
  • 1
  • 2
  • 3
  • 4
                                </div>
            <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-526ced5128.css" rel="stylesheet">
                </div>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值