Kafka | 集成Spring Boot 及特性

kafka概述

  • 消息中间件对比
特性ActiveMQRabbitMQRocketMQKafka
开发语言javaerlangjavascala
单机吞吐量万级万级10万级100万级
时效性msusmsms级以内
可用性高(主从)高(主从)非常高(分布式)非常高(分布式)
功能特性成熟的产品、较全的文档、各种协议支持好并发能力强、性能好、延迟低MQ功能比较完善,扩展性佳只支持主要的MQ功能,主要应用于大数据领域
  • 消息中间件对比-选择建议
消息中间件建议
Kafka追求高吞吐量,适合产生大量数据的互联网服务的数据收集业务
RocketMQ可靠性要求很高的金融互联网领域,稳定性高,经历了多次阿里双11考验
RabbitMQ性能较好,社区活跃度高,数据量没有那么大,优先选择功能比较完备的RabbitMQ
  • kafka介绍
    kafka 是一个分布式流媒体平台,类似于消息队列或企业消息传递系统。kafka官网:【http://kafka.apache.org/ 】
    在这里插入图片描述
    在这里插入图片描述
  1. producer:发布消息的对象称之为主题生产者(Kafka topic producer)
  2. topic:Kafka将消息分门别类,每一类的消息称之为一个主题(Topic)
  3. consumer:订阅消息并处理发布的消息的对象称之为主题消费者(consumers)
  4. broker:已发布的消息保存在一组服务器中,称之为Kafka集群。集群中的每一个服务器都是一个代理(Broker)。 消费者可以订阅一个或多个主题(topic),并从Broker拉数据,从而消费这些已发布的消息。

kafka安装配置

1 kafka集群搭建(unix系统)

1.1 主机配置

IP地址服务器名称安装服务别名
182.92.176.168iZ2zee9wadzha8z9dirc94Zjdk、zookeeper、kafkaserver1
182.92.148.185iZ2zee9wadzha8z9dirc96Zjdk、zookeeper、kafkaserver2
123.56.94.183iZ2zee9wadzha8z9dirc95Zjdk、zookeeper、kafkaserver3

1.2.主机配置免密登录

参照:https://www.cnblogs.com/luzhanshi/p/13369797.html

1.3.安装zookeeper集群

1.下载zookeeper:
    https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz
2.解压到server1的/usr/local/zookeeper/目录:
    tar -xzvf apache-zookeeper-3.7.0-bin.tar.gz 
3.菜单进入conf目录下面,将zoo_sample.cfg复制一份到本目录并改名为zoo.cfg
    cd apache-zookeeper-3.7.0-bin/conf
    cp zoo_sample.cfg zoo.cfg
4.zoo.cfg文件编辑如下
5.创建zookeeper日志输入目录:
    mkdir /usr/local/zookeeper/apache-zookeeper-3.7.0-bin/zkdatas
6.创建myid并编辑myid内容为1
    cd /usr/local/zookeeper/apache-zookeeper-3.7.0-bin/zkdatas
    touch myid
7.分发到其他主机节点,分发到node02节点,并修改myid为2,分发到node03节点,并修改myid为3:
    node02节点:scp -r ./apache-zookeeper-3.7.0-bin server2:/usr/local/zookeeper
    node03节点:scp -r apache-zookeeper-3.7.0-bin/ server3:/usr/local/zookeeper
    
8.启动zookeeper集群,分别在node01/node02/node03节点启动/停止:
    /usr/local/zookeeper/apache-zookeeper-3.7.0-bin/bin/zkServer.sh start/stop    
    
9.查看集群状态:
    /usr/local/zookeeper/apache-zookeeper-3.7.0-bin/bin/zkServer.sh status

zoo.cfg 文件:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# 事务日志输出目录
dataDir=/usr/local/zookeeper/apache-zookeeper-3.7.0-bin/zkdatas
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1

## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true

#集群服务器配置,数字1/2/3需要与myid文件一致。右边两个端口,2888表示数据同步和通信端口;3888表示选举端口
server.1=182.92.176.168:2888:3888
server.2=182.92.148.185:2888:3888
server.3=123.56.94.183:2888:3888

quorumListenOnAllIPs=true

1.4.安装kafka集群

1.下载kafka:
    https://www.apache.org/dyn/closer.cgi?path=/kafka/2.8.0/kafka_2.12-2.8.0.tgz
2.解压到server1的/usr/local/kafka/目录:
    tar -xzvf kafka_2.12-2.8.0.tgz
3.菜单进入config目录下面,修改server.properties文件
    cd /usr/local/kafka/kafka_2.12-2.8.0/config/
4.server.properties文件编辑如下
5.创建数据存储目录:
    mkdir /usr/local/kafka/kafka_2.12-2.8.0/kfk-logs
6.分发到其他主机节点,分发到node02节点,分发到node03节点:
    先在node2和node3节点创建对应目录:mkdir -p /usr/local/kafka/kafka_2.12-2.8.0/
    node02节点:scp -r /usr/local/kafka/kafka_2.12-2.8.0/ 182.92.148.185:/usr/local/kafka/
    node03节点:scp -r /usr/local/kafka/kafka_2.12-2.8.0/ 123.56.94.183:/usr/local/kafka/
7.修改node02节点和node03节点server.properties文件:
    node02节点:broker.id=1   advertised.host.name=182.92.148.185
    node03节点:broker.id=2  advertised.host.name=123.56.94.183
    
8.启动kafka集群,分别在node01/node02/node03节点启动,-daemon(以后台服务方式启动) 后面跟的是以配置文件启动:
    /usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-server-start.sh -daemon /usr/local/kafka/kafka_2.12-2.8.0/config/server.properties
    
9.停止kafka集群:
    /usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-server-stop.sh
10.jps查看是否有kafka进程

server.properties 文件:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
port=9092
advertised.host.name=182.92.176.168

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/usr/local/kafka/kafka_2.12-2.8.0/kfk-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
# 集群地址配置
zookeeper.connect=182.92.176.168:2181,182.92.148.185:2181,123.56.94.183:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=18000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

1.5.kafka基本使用—topic:

#查看topic 列表:
    /usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-topics.sh --list --zookeeper 182.92.176.168:2181,182.92.148.185:2181,123.56.94.183:2181
    
#查看指定topic:
/usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-topics.sh  --describe --zookeeper 182.92.176.168:2181,182.92.148.185:2181,123.56.94.183:2181 --topic topic_1
    
#创建topic
# --create:表示创建
# --zookeeper 后面的参数是zk的集群节点
# --replication-factor 1 :表示复本数
# --partitions 1:表示分区数
# --topic itheima_topic:表示topic的主题名称

/usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-topics.sh --create --zookeeper 182.92.176.168:2181,182.92.148.185:2181,123.56.94.183:2181 --replication-factor 2 --partitions 3 --topic topic_1

#删除topic
/usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-topics.sh --delete --zookeeper server1:2181,server2:2181,server3:2181 --topic topic_1

1.6.kafka基本使用—生产者:

#创建生产者,生产消息
/usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-console-producer.sh --broker-list 182.92.176.168:9092,182.92.148.185:9092,123.56.94.183:9092 --topic topic_1

1.7.kafka基本使用—消费者:

#创建消费者,消费消息:
/usr/local/kafka/kafka_2.12-2.8.0/bin/kafka-console-consumer.sh --bootstrap-server 182.92.176.168:9092,182.92.148.185:9092,123.56.94.183:9092 --from-beginning --topic topic_1

2 简单示例

2.1 spring-kafka 相关配置

# kafka集群地址
spring.kafka.bootstrap-servers=172.18.2.1:9092,172.18.2.2:9092,172.18.2.3:9092

# kafka生产者
# 0-不应答,1-leader应答,all-所有leader和follower应答
spring.kafka.producer.acks=1
# 发送失败时,重试发送的次数
spring.kafka.producer.retries=3
# producer用于压缩数据的压缩类型。默认是无压缩。正确的选项值是none、gzip、snappy。压缩最好用于批量处理,批量处理消息越多,压缩性能越好
spring.kafka.producer.compression-type=snappy
# 每次批量发送消息的最大数量
spring.kafka.producer.batch-size=100000
# 批处理延迟时间上限,不管是否消息数量是否到达batch-size或者消息大小到达buffer-memory后,都直接发送一次请求
spring.kafka.producer.properties.linger.ms=5000
# 每次批量发送消息的最大内存
spring.kafka.producer.buffer-memory=33554432
# kafka可以在一个connection中发送多个请求,叫作一个flight,这样可以减少开销,但是如果产生错误,可能会造成数据的发送顺序改变,默认是5 (修改)
spring.kafka.producer.properties.max.in.flight.requests.per.connection=20
# 消息的key的序列化
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
# 消息的value的序列化
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.JsonSerializer 

# kafka消费者
# 用来唯一标识consumer进程所在组的字符串,如果设置同样的group id,表示这些processes都是属于同一个consumer group,默认:""
spring.kafka.consumer.group-id=group-datax
# 该字段的默认配置为false,默认情况下leader不能从非ISR的副本列表里选择.因为在非ISR副本列表里选择leader,很有可能会导致部分数据丢失,kafka的可用性就会降低
spring.kafka.consumer.unclean.leader.election.enable=false
# 如果为真,consumer所fetch的消息的offset将会自动的同步到zookeeper。这项提交的offset将在进程挂掉时,由新的consumer使用,默认:true
spring.kafka.consumer.enable-auto-commit=true
spring.kafka.consumer.auto.commit.interval.ms=1000
# 没有初始化的offset时,可以设置以下三种情况:(默认:latest)
# earliest,当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,从头开始消费
# latest,当各分区下有已提交的offset时,从提交的offset开始消费;无提交的offset时,消费新产生的该分区下的数据
# none,topic各分区都存在已提交的offset时,从offset后开始消费;只要有一个分区不存在已提交的offset,则抛出异常
spring.kafka.consumer.auto-offset-reset=earliest
# 会话的超时限制。如果consumer在这段时间内没有发送心跳信息,则它会被认为挂掉了,并且reblance将会产生,必须在[group.min.session.timeout.ms, group.max.session.timeout.ms]范围内。默认:10000
spring.kafka.consumer.properties.session.timeout.ms=30000
# key的反序列化类
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
# 值的反序列化类
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.JsonDeserializer 

# kafka监听器
# 消费监听接口监听的主题不存在时,默认会报错。所以通过设置为 false ,解决报错
spring.kafka.listener.missing-topics-fatal=false

# kafka日志配置
# 限制只打印 ERROR 级别
logging.level.org.springframework.kafka=ERROR
logging.level.org.apache.kafka=ERROR

在这里插入图片描述

2.2 生产者消息发送方式

使用 Kafka-Spring 封装提供的 KafkaTemplate ,主要实现 2 种消息发送方式,一种同步发送,另一种是异步发送;

@Component
@RequiredArgsConstructor
public class ProducerUtil {

    private final KafkaTemplate<Object, Object> kafkaTemplate;

    /**
     * 异步发送消息
     * @param topicName  主题
     * @param part       分区
     * @param message   消息
     */
    public void asyncSend(String topicName, String part, String message) {
        kafkaTemplate.send(topicName, part, message);
    }

    /**
     * 同步发送消息,调用了 ListenableFuture 对象的get() 方法,阻塞等待发送结果,从而实现同步的效果
     * @param topicName 主题
     * @param part 分区
     * @param message 消息
     * @return 结果
     * @throws ExecutionException
     * @throws InterruptedException
     */
    public SendResult<Object, Object> syncSend(String topicName, String part, String message) throws ExecutionException, InterruptedException {
        return kafkaTemplate.send(topicName, part, message).get();
    }
}

2.3 消费者消费机制

集群消费: 同一个消费组的多个消费者实例会分别消费同一个topic下不同分区的数据,在消费者充足的条件下,每个分区最多也只有一个消费者实例;当消费者数量少于分区数量时,某个消费者会同时对应多个分区。消费者实例数量应与分区数量对应。默认情况下Kafka中同一个topic下的消息对于某一个消费组来说是集群消费模式,也就是只会被组内一个消费实例所消费。

# 同一个消费组的多个消费者实例
@Component
@RequiredArgsConstructor
public class KafkaConsumer1 {

    @KafkaListener(topics = "kafka-test", groupId = "group_1")
    public void message(String message){
        log.info("消费者KafkaConsumer1, 分组group_1," + message);
    }
}

@Component
@RequiredArgsConstructor
public class KafkaConsumer2 {

    @KafkaListener(topics = "kafka-test", groupId = "group_1")
    public void message(String message){
        log.info("消费者KafkaConsumer2, 分组group_1," + message);
    }
}

1.同一个消费组,当topic kafka-test只有一个分区时,只有实例KafkaConsumer1在消费。从下图也可以看出,kafka消费在同一个分区下是顺序性消费的。所以当某个topic下消息量激增时,如果只增加消费者实例数量可能并不能达到较好的效果,因为多余分区数的消费实例是空闲的,因此需要同时扩容分区数和消费者实例数量才能达到较好的效果。

在这里插入图片描述

在这里插入图片描述

2.同一个消费组,每个消息只会消费一次,当topic kafka-test有二个分区时,实例KafkaConsumer1和实例KafkaConsumer2都在消费;

广播消费:同一个topic下的消息被多个消费者消费称为广播消费.由于Kafka默认是集群消费模式,所以广播消费的实现方式就是为广播消费的多个应用实例都设置不同的GroupId即每个实例都是单独的消费组。

# 同一个topic的多个消费组
@Component
@RequiredArgsConstructor
public class KafkaConsumer1 {

    @KafkaListener(topics = "kafka-test", groupId = "group_1")
    public void message(String message){
        log.info("消费者KafkaConsumer1, 分组group_1," + message);
    }
}

@Component
@RequiredArgsConstructor
public class KafkaConsumer2 {

    @KafkaListener(topics = "kafka-test", groupId = "group_2")
    public void message(String message){
        log.info("消费者KafkaConsumer2, 分组group_2," + message);
    }
}

3.不同消费组,同一信息会被每个消费组都消费一次;

3 生产者-批量发送

Kafka 提供提供了一个 RecordAccumulator 消息收集器,将发送给相同 Topic 的相同 Partition 分区的消息们,“偷偷”收集在一起,当满足条件时候,一次性批量发送提交给 Kafka Broker 。如下是三个条件,满足任一即会批量发送:

【数量】batch-size :超过收集的消息数量的最大条数。

【空间】buffer-memory :超过收集的消息占用的最大内存。

【时间】linger.ms :超过收集的时间的最大等待时长,单位:毫秒。

3.1 生产者配置

spring:
  kafka:
    bootstrap-servers:
      - 10.30.4.62:31090
    producer:
      acks: 1 # 0-不应答。1-leader 应答。all-所有 leader 和 follower 应答。
      retries: 3 # 发送失败时,重试发送的次数
      key-serializer: org.apache.kafka.common.serialization.StringSerializer # 消息的 key 的序列化
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer # 消息的 value 的序列化
      batch-size: 16384 # 每次批量发送消息的最大数量
      buffer-memory: 33554432 # 每次批量发送消息的最大内存
      properties:
        linger:
          ms: 30000 # 批处理延迟时间上限。这里配置为 30 * 1000 ms 过后,不管是否消息数量是否到达 batch-size 或者消息大小到达 buffer-memory 后,都直接发送一次请求。
    consumer:
      auto-offset-reset: earliest # 设置消费者分组最初的消费进度为 earliest
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-serializer: org.springframework.kafka.support.serializer.JsonDeserializer # 消息的 value 的序列化
    listener:
      missing-topics-fatal: false

logging:
  level:
    org:
      springframework:
        kafka: ERROR # kafka日志级别限制error
      apache:
        kafka: ERROR

3.2 消费者消费

public class KafkaConsumer1 {
    @KafkaListener(topics = "kafka-test", groupId = "group_1")
    public void message(String message){
        log.info("消费者KafkaConsumer1, 分组group_1," + message);
    }
}

3.3 生产者发送

@Test
public void test() {
    for (int i = 0; i < 3; i++) {
        producerUtil.asyncSend("kafka-test",String.valueOf(i),"第"+i+"个消息");
        try {
            Thread.sleep(10 * 1000L);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

从结果看出,生产者设置虽然是每隔10s发送一次消息,但实际上是一次性发的,从消费组几乎同时消费这3条消息可以侧面证实;

在这里插入图片描述

4 消费者-批量消费

4.1 消费者配置

spring:
  kafka:
    bootstrap-servers:
      - 10.30.4.62:31090
    producer:
      acks: 1 # 0-不应答。1-leader 应答。all-所有 leader 和 follower 应答。
      retries: 3 # 发送失败时,重试发送的次数
      key-serializer: org.apache.kafka.common.serialization.StringSerializer # 消息的 key 的序列化
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer # 消息的 value 的序列化
    consumer:
      auto-offset-reset: earliest # 设置消费者分组最初的消费进度为 earliest
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-serializer: org.springframework.kafka.support.serializer.JsonDeserializer # 消息的 value 的序列化
      fetch-max-wait: 20000 # poll 一次拉取的阻塞的最大时长,单位:毫秒。这里指的是阻塞拉取需要满足至少 fetch-min-size 大小的消息
      fetch-min-size: 10 # poll 一次消息拉取的最小数据量,单位:字节
      max-poll-records: 100 # poll 一次消息拉取的最大数量
    listener:
      type: BATCH  # 监听器类型,默认为 SINGLE ,只监听单条消息。这里我们配置 BATCH ,监听多条消息,批量消费
      missing-topics-fatal: false

logging:
  level:
    org:
      springframework:
        kafka: ERROR # kafka日志级别限制error
      apache:
        kafka: ERROR

4.2 消费者消费

# 消费以List形式接收
@Component
@RequiredArgsConstructor
public class KafkaConsumer1 {
    @KafkaListener(topics = "kafka-test", groupId = "group_1")
    public void message(List<String> messages){
        log.info("消费者KafkaConsumer1, 分组group_1," + messages + ",消费者数量:" + messages.size());
    }
}

从结果看,消费者是以List形式接收,1000条消息分12批次接收;
在这里插入图片描述

5 消费者-并发消费

Spring-Kafka 会根据 @KafkaListener(concurrency=2) 注解,创建 2 个 Kafka Consumer,每个 Kafka Consumer 会被单独分配到一个线程中,进行拉取消息,消费消息。在各自的线程中,拉取各自的 Topic 的 Partition 的消息,各自串行消费。从而实现多线程的并发消费。concurrency 配置可以替代创建多个消费者实例。

5.1 消费者消费

@Component
@RequiredArgsConstructor
public class KafkaConsumer1 {
    @KafkaListener(topics = "kafka-test", groupId = "group_1", concurrency = "2")
    public void message(List<String> messages){
        log.info("消费者KafkaConsumer1, 分组group_1," + messages + ",当前线程:" + Thread.currentThread().getName());
    }
}

从结果看出,有2个线程在消费消息,每个消息只会在一个线程进行消费。
在这里插入图片描述

6 顺序消息

我们先来一起了解下顺序消息的顺序消息的定义:

普通顺序消息 :Producer 将相关联的消息发送到相同的消息队列。

完全严格顺序 :在【普通顺序消息】的基础上,Consumer 严格顺序消费。

在上述的示例中,我们看到 Spring-Kafka在Consumer消费消息时,天然就支持按照 Topic 下的 Partition 下的消息,顺序消费。那么此时,我们只需要考虑将Producer将相关联的消息发送到 Topic 下的相同的 Partition 即可。只要我们发送消息时,指定了消息的key,Producer则会根据key的哈希值取模来获取到其在Topic下对应的 Partition。

6.1 生产者发送

public void test() throws ExecutionException, InterruptedException {
    for (int i = 0; i < 10; i++) {
        String key = "001";
        SendResult<Object, Object> result = producerUtil.syncSend("kafka-test", key, "第" + i + "个消息");
        log.info("[testSyncSend][发送key:[{}] 发送分区:[{}]]", key, result.getRecordMetadata().partition());
    }
}

6.2 消费者消费

@Slf4j
@Component
@RequiredArgsConstructor
public class KafkaConsumer1 {

    @KafkaListener(topics = "kafka-test", groupId = "group_1")
    public void message(List<String> messages){
        log.info("消费者KafkaConsumer1, 分组group_1," + messages + ",当前线程:" + Thread.currentThread().getName());
    }
}

发送日志:

在这里插入图片描述

消费日志:

在这里插入图片描述

从发送日志可以看出,当key相同的情况下,消息发送在同一个分区里面;从消费日志可以看出,同一分区同一线程消费是有序的。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值