讨论偏移量我们首先要知道如何查看偏移量及消费者目前消费的偏移量
./kafka-consumer-groups.sh --describe --bootstrap-server 192.168.153.128:9092 --group ConsumerGroup3
Consumer group 'ConsumerGroup3' has no active members:可以看到我们这个消费者组现在是没有消费者的
test_topic这个topic其他分区没有消息,只有分区2有消费记录:
CURRENT-OFFSET:目前已被消费者消费的消息偏移量
LOG-END-OFFSET:消息最大偏移量
LAG:消息堆积的条数
1、自动提交方式
这种提交方式有两个很重要的参数:
enable.auto.commit=true(是否开启自动提交,true or false)
auto.commit.interval.ms=5000(提交偏移量的时间间隔,默认5000ms)
每隔5秒,消费者会自动把从poll方法接收到的最大偏移量提交上去。自动提交是在轮询中进行,消费者每次轮询时都会检查是否提交该偏移量。可是这种情况会发生重复消费和丢失消息的情况。
重复消费:如果我们设auto.commit.interval.ms=60000,16:34首次提交偏移量62,此时又拉取了2条消息,此时分区2对应的消费者宕机,发生了分区再均衡,(分区的所有权从一个消费者转到另一个消费者被称为再均衡。一般新增消费者,消费者关闭或改变分区数都会发生再均衡)分区2的消息由另一个消费者消费,新的消费者会读取16:34提交的那个偏移量,这样就会发生重复消费了,我们来实践一下:
我们开两个消费端consumer1和consumer2:
private static final String TOPIC_NAME = "test_topic";
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers",
"192.168.153.128:9092,192.168.153.128:9093,192.168.153.128:9094");
props.put("group.id", "ConsumerGroup3");
/* 是否自动确认offset */
props.put("enable.auto.commit", "true");
/* 自动确认offset的时间间隔 */
props.put("auto.commit.interval.ms", "60000");
props.put("session.timeout.ms", "30000");
// props.put("auto.offset.reset", "earliest");
props.put("auto.offset.reset", "latest");
// 序列化类
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Collections.singletonList(TOPIC_NAME));
try {
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records)
System.out.printf("消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
} finally {
consumer.close();
}
}
生产者:
producer = new KafkaProducer<>(props);
for (int i = 0; i < 2; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC_NAME, 2,
"key01", "连衣裙" + i);
try {
RecordMetadata result = producer.send(record).get();
System.out.printf("同步发送:%s,分区:%d,offset:%d\n", result.topic(),
result.partition(), result.offset());
} catch (Exception e) {
e.printStackTrace();
}
}
生产者生产了2条数据:
同步发送:test_topic,分区:2,offset:62
同步发送:test_topic,分区:2,offset:63
consumer2消费的消息
2019-01-31 17:02:17 消费消息:topic=test_topic, partition=2, offset=62, key=key01, value=连衣裙0
2019-01-31 17:02:17 消费消息:topic=test_topic, partition=2, offset=63, key=key01, value=连衣裙1
此时我们关闭consumer2,去服务器执行./kafka-consumer-groups.sh --describe --bootstrap-server 192.168.153.128:9092 --group ConsumerGroup3
我们再看consumer1消费的消息
2019-01-31 17:02:50 消费消息:topic=test_topic, partition=2, offset=62, key=key01, value=连衣裙0
2019-01-31 17:02:50 消费消息:topic=test_topic, partition=2, offset=63, key=key01, value=连衣裙1
可以看到发生了重复消费。
丢失消息:消费者一次poll100条新消息,并且提交了偏移量,此时消费者还没处理完,就宕机了,又发生了再均衡,由另一个消费者消费该分区的消息,新的消费者会读取旧消费者最后一次提交的偏移量,此时就会发生消息丢失了。我们来实践一下:
生产者:
for (int i = 0; i < 10; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC_NAME, 2,
"key01", "连衣裙" + i);
try {
RecordMetadata result = producer.send(record).get();
System.out.printf("同步发送:%s,分区:%d,offset:%d\n", result.topic(),
result.partition(), result.offset());
} catch (Exception e) {
e.printStackTrace();
}
}
生产10条消息:
同步发送:test_topic,分区:2,offset:208
同步发送:test_topic,分区:2,offset:209
同步发送:test_topic,分区:2,offset:210
同步发送:test_topic,分区:2,offset:211
同步发送:test_topic,分区:2,offset:212
同步发送:test_topic,分区:2,offset:213
同步发送:test_topic,分区:2,offset:214
同步发送:test_topic,分区:2,offset:215
同步发送:test_topic,分区:2,offset:216
同步发送:test_topic,分区:2,offset:217
consumer2:
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (!records.isEmpty()){
System.out.println("consumer2消费了这批消息!");
}
for (ConsumerRecord<String, String> record : records){
new Thread(new Runnable() {
@Override
public void run() {
try {
Thread.sleep(60000);//用阻塞来模拟数据处理过程所消耗的时间
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
}).start();
}
}
consumer2消费了这批消息!
consumer2消费了这批消息!
consumer2消费了这批消息!
consumer2消费了这批消息!
consumer2消费了这批消息!
consumer2消费了这批消息!
可见consumer2已经接收到了这批消息但并未进行逻辑处理,因为提交偏移量的周期时间比处理数据的时间小,所以已经提交了偏移量,此时我们关闭consumer2,再发送10条消息
同步发送:test_topic,分区:2,offset:218
同步发送:test_topic,分区:2,offset:219
同步发送:test_topic,分区:2,offset:220
同步发送:test_topic,分区:2,offset:221
同步发送:test_topic,分区:2,offset:222
同步发送:test_topic,分区:2,offset:223
同步发送:test_topic,分区:2,offset:224
同步发送:test_topic,分区:2,offset:225
同步发送:test_topic,分区:2,offset:226
同步发送:test_topic,分区:2,offset:227
我们看一下consumer1
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (!records.isEmpty()){
System.out.println("consumer1消费了这批消息!");
}
for (ConsumerRecord<String, String> record : records){
new Thread(new Runnable() {
@Override
public void run() {
try {
Thread.sleep(60000);//用阻塞来模拟数据处理过程所消耗的时间
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
}).start();
}
}
consumer1消费了这批消息!
consumer1消费了这批消息!
consumer1消费了这批消息!
consumer1消费了这批消息!
consumer1消费了这批消息!
consumer1消费了这批消息!
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=220, key=key01, value=连衣裙2
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=218, key=key01, value=连衣裙0
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=219, key=key01, value=连衣裙1
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=221, key=key01, value=连衣裙3
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=222, key=key01, value=连衣裙4
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=223, key=key01, value=连衣裙5
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=225, key=key01, value=连衣裙7
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=224, key=key01, value=连衣裙6
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=227, key=key01, value=连衣裙9
2019-02-32 09:27:13 消费消息:topic=test_topic, partition=2, offset=226, key=key01, value=连衣裙8
其实我们不看consumer1可能也知道208到217之间的那10条消息其实已经丢失了。consumer1是从220开始消费的。
由此我们知道自动提交的方式是有弊端的,如果你是同步处理数据,再均衡时很容易发生消息重复,如果你是异步处理数据,则易发生数据丢失,这都是我们不想看到的。
2、提交当前偏移量
enable.auto.commit=false,用commitSync()提交由poll方法返回的最新偏移量,如果提交成功马上返回,提交失败则抛出异常。
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (!records.isEmpty()){
System.out.println("consumer1消费了这批消息!");
}
for (ConsumerRecord<String, String> record : records){
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
consumer.commitSync();
}
3、异步提交
手动提交不足之处在于提交请求后,broker响应之前应用程序会一直阻塞。这样就会限制应用程序的吞吐量。虽然可以通过降低提交频率来提升吞吐量,但一旦发生再均衡,会增加重复消息的数量。
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (!records.isEmpty()){
System.out.println("consumer1消费了这批消息!");
}
for (ConsumerRecord<String, String> record : records){
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
consumer.commitAsync();
}
异步提交还可以有回调
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (records.isEmpty()){
continue;
}
for (ConsumerRecord<String, String> record : records){
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
consumer.commitAsync(new OffsetCommitCallback() {
@Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception e)
{
if (e != null) {
System.out.println(offsets.toString());
System.out.println(e.toString());
}
}
});
}
4、同步异步组合提交
如果提交失败发生在关闭消费者或者再均衡前的最后一次提交,那么就要确保提交能够成功。这个时候就需要使用同步异步组合提交。
try {
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (records.isEmpty()){
continue;
}
for (ConsumerRecord<String, String> record : records){
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
}
consumer.commitAsync();
}
} finally {
try{
consumer.commitSync();
} finally{
consumer.close();
}
}
5、提交特定偏移量
如果想要在批次中间提交偏移量,消费者API允许在调用commitSync和commitAsync时传递希望提交的分区和偏移量。
Map<TopicPartition, OffsetAndMetadata> currentOffsets = new HashMap<TopicPartition, OffsetAndMetadata>();
int count = 0;
try {
for (; ; ) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
if (records.isEmpty()){
continue;
}
for (ConsumerRecord<String, String> record : records){
System.out.printf(DateUtil.getDate() + " 消费消息:topic=%s, partition=%d, offset=%d, key=%s, value=%s\n",
record.topic(), record.partition(), record.offset(), record.key(), record.value());
currentOffsets.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset(), "no metadata"));
if (count%1000==0) {
consumer.commitAsync(currentOffsets, null);
}
count++;
}
}
} finally {
try{
consumer.commitSync();
} finally{
consumer.close();
}
}
参考文献的文章写得很好,大家可以去看看。
参考文献:https://blog.youkuaiyun.com/ljheee/article/details/81605754