转载地址:https://blog.youkuaiyun.com/ricky110/article/details/79157043
https://blog.youkuaiyun.com/DilemmaVF/article/details/71124060
https://blog.youkuaiyun.com/yanhx1204/article/details/54965012
https://blog.youkuaiyun.com/see_you_see_me/article/details/78468421
目录
一、简介
python连接kafka的标准库,kafka-python和pykafka。kafka-python使用的人多是比较成熟的库,kafka-python并没有zk的支持。pykafka是Samsa的升级版本,使用samsa连接zookeeper,生产者直接连接kafka服务器列表,消费者才用zookeeper。使用kafka Cluster。
二、pykafka
(1) pykafka安装
根据机器环境从以下三种方式中选择进行一种安装pykafka,版本号是2.7.0。
# PyPI安装
pip install pykafka
# conda安装
conda install -c conda-forge pykafka
# anaconda自带pip安装
/root/anaconda3/bin/pip install pykafka
(2) pykafka的api
1、http://pykafka.readthedocs.io/en/latest/,https://github.com/Parsely/pykafka
2、在pykafka安装目录site-packages/pykafka/下,直接查看。
(3) pykafka生产者api
#coding=utf-8
import time
from pykafka import KafkaClient
class KafkaTest(object):
"""
测试kafka常用api
"""
def __init__(self, host="192.168.237.129:9092"):
self.host = host
self.client = KafkaClient(hosts=self.host)
def producer_partition(self, topic):
"""
生产者分区查看,主要查看生产消息时offset的变化
:return:
"""
topic = self.client.topics[topic.encode()]
partitions = topic.partitions
print (u"查看所有分区 {}".format(partitions))
earliest_offset = topic.earliest_available_offsets()
print(u"获取最早可用的offset {}".format(earliest_offset))
# 生产消息之前看看offset
last_offset = topic.latest_available_offsets()
print(u"最近可用offset {}".format(last_offset))
# 同步生产消息
p = topic.get_producer(sync=True)
p.produce(str(time.time()).encode())
# 查看offset的变化
last_offset = topic.latest_available_offsets()
print(u"最近可用offset {}".format(last_offset))
def producer_designated_partition(self, topic):
"""
往指定分区写消息,如果要控制打印到某个分区,
需要在获取生产者的时候指定选区函数,
并且在生产消息的时候额外指定一个key
:return:
"""
def assign_patition(pid, key):
"""
指定特定分区, 这里测试写入第一个分区(id=0)
:param pid: 为分区列表
:param key:
:return:
"""
print("为消息分配partition {} {}".format(pid, key))
return pid[0]
topic = self.client.topics[topic.encode()]
p = topic.get_producer(sync=True, partitioner=assign_patition)
p.produce(str(time.time()).encode(), partition_key=b"partition_key_0")
def async_produce_message(self, topic):
"""
异步生产消息,消息会被推到一个队列里面,
另外一个线程会在队列中消息大小满足一个阈值(min_queued_messages)
或到达一段时间(linger_ms)后统一发送,默认5s
:return:
"""
topic = self.client.topics[topic.encode()]
last_offset = topic.latest_available_offsets()
print("最近的偏移量 offset {}".format(last_offset))
# 记录最初的偏移量
old_offset = last_offset[0].offset[0]
p = topic.get_producer(sync=False, partitioner=lambda pid, key: pid[0])
p.produce(str(time.time()).encode())