1. 类
public class KafkaConsumer<K,V>
extends java.lang.Object
implements Consumer<K,V>
2. 消费组与消息订阅
1)不指定consumer与topic partition的关系的话,topic partition会被均匀的分配给消费组中的consumer去消费。
2) max.poll.interval.ms 一次消息处理的时间间隔。max.poll.records 一次消息处理的最大数量(该处理须在max.poll.interval.ms内完成,如果这段时间内没有处理完消息,则不会出发下一次poll,可能会导致无法发送心跳而被认为consumer挂掉)。
3) consumer挂掉会rebalance,将它消费的partion分配给其它的consumer。
4) 可以手动指定consumer消费的topic partition,这种方式使用于如下场景:
- 如果comsumer进程维护了partion相关的本地状态。
- 如果consumer本身是高可靠的,能够自动从失效状态恢复(如YARN Mesos AWS),这样就不需要kafka的自身错误探测。
手动指定和自动分配不可共存。
示例:
String topic = "foo";
TopicPartition partition0 = new TopicPartition(topic, 0);
TopicPartition partition1 = new TopicPartition(topic, 1);
consumer.assign(Arrays.asList(partition0, partition1));
5) 可以使用seek(TopicPartition, long)/
seekToBeginning(Collection)/
指定消费位置。使用 seekToEnd(Collection)
pause(Collection)
and resume(Collection)
暂停和重启(暂停一个,等待另一个,差不多的时候再重启)。
6) consumer不是线程安全的,尽量保证cosumer只在一个线程中使用。
3. offset管理
kafka自动提交offset往往是不可靠的,因为定时提交时,消息可能还没有提交成功。
可以在kafka之外存储offset,例如将offset和result存储在一起:
- 将results和offset都存储在关系数据库中
- 将results和offset都存储在本地
- 使用
ConsumerRecord
保存offset
- 重启时使用
seek(TopicPartition, long)得到消费位置。
好处在于,同时处理results和offset能保证他们两个同时成功或失败,保证"exactly once"语意。
手动管理offset,可以通过如下3步进行:
手动管理offset与手动分配topic partition配合良好,但是如果是自动分配partition,则有可能存在partition漂移的问题。This can be done by providing a ConsumerRebalanceListener
instance in the call to subscribe(Collection, ConsumerRebalanceListener)
and subscribe(Pattern, ConsumerRebalanceListener)
. For example, when partitions are taken from a consumer the consumer will want to commit its offset for those partitions by implementing ConsumerRebalanceListener.onPartitionsRevoked(Collection)
. When partitions are assigned to a consumer, the consumer will want to look up the offset for those new partitions and correctly initialize the consumer to that position by implementing ConsumerRebalanceListener.onPartitionsAssigned(Collection)
.
Another common use for ConsumerRebalanceListener
is to flush any caches the application maintains for partitions that are moved elsewhere.
4. 使用示例
自动offset提交:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
手动offset控制:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}