Kafka编程指南之三：生产者API实战

最新推荐文章于 2022-06-11 23:12:56 发布

Darren.P

最新推荐文章于 2022-06-11 23:12:56 发布

阅读量541

点赞数

CC 4.0 BY-SA版权

分类专栏： KAFKA 文章标签： producer.send ProducerRecord Partitioner rentention callback

本文链接：https://blog.youkuaiyun.com/weixin_42628594/article/details/85558782

KAFKA 专栏收录该内容

7 篇文章

订阅专栏

本文详细介绍了Kafka生产者的配置参数与实践方法，包括搭建环境、编写生产者代码、消息发送方式（同步、异步）、自定义分区器、消息确认机制（acks）、消息保留策略、消息大小调整及重试次数等关键概念。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

前一篇文章用控制台命令实践了Kafka的消息生产和消费，接下来用Java编写生产者和消费者程序。
生产者先声明一个ProducerRecord对象，包含了topic、partition、key、value信息，然后通过Send()方法发送。因为要通过网络传输，所以要经过序列化。还可以自定义分区器，下面通过例子来说明。需要注意的是，发送消息是分批的，如果没有达到批次要求，也是不会实际发送的。

搭建环境

按之前的文章搭建好Kafka服务器后，在本地用IDEA新建一个maven项目kafkatest。使用如下pom.xml文件，自动导入依赖包。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>kafkatest</groupId>
    <artifactId>kafkatest</artifactId>
    <version>1.0-SNAPSHOT</version>
    <name>kafkatest</name>
    <url>http://maven.apache.org</url>
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka_2.11</artifactId>
        <version>1.0.0</version>
    </dependency>
</dependencies>
</project>

编写生产者producer.send(record);

在消息发送时，需要考虑应用场景：
消息不允许丢失，也不允许重复（金融业务）；
允许丢失少量消息，也可以延迟，保持高吞吐（用户行为记录）；
生产者调用send()函数将record，发送给broker。
三种Send方法

Properties props = new Properties();
props.put("bootstrap.servers", "bigdata01:9092,bigdata02:9092,bigdata03:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i=0; i<100; i++)
	producer.send(new ProducerRecord<String, String>("javatopic", Integer.toString(i), "messager:"+Integer.toString(i)));
producer.close();//未达到批次要求大小强制发送

send()方法是异步的，把消息加入到消息队列中后立即返回，这样可以批量发送。
producer为每个partition维护了未发送消息的缓冲区，缓冲区的大小可以设置（ props.put(“batch.size”, 16384);）

Fire-and-forget

不保证消息会成功发送，producer会自动重试，但还是无法保证不会丢失消息。如上述的示例。

同步发送

send()方法会返回Future对象并使用get()方法阻塞，

        try {
            Future<RecordMetadata> futurerm= producer.send(record);
            RecordMetadata rm = futurerm.get();
            long offset = rm.offset();
            int partition = rm.partition();
            String topic = rm.topic();
            System.out.println("topic:"+topic+",partition:"+partition+",offset:"+offset);

            producer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }

异步发送

send()方法加入回调函数callback

for (int i=0; i<100; i++){
	try {
		producer.send(new ProducerRecord<String,String>("javatopic", Integer.toString(i),"messagecallback:"+i),new MyCallback());
	} catch(Exception e){
                e.printStackTrace();
    }
}
producer.close();

callback函数如下：

package producer;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.RecordMetadata;
public class MyCallback implements Callback{

    @Override
    public void onCompletion(RecordMetadata recordMetadata, Exception e) {
        if(e != null){
            //异常处理
            e.printStackTrace();
        }
        else{
            long offset = recordMetadata.offset();
            int partition = recordMetadata.partition();
            String topic = recordMetadata.topic();
            System.out.println("topic:" + topic + ",partition:" + partition + ",offset:" + offset);

        }
    }
}

多线程模式

public class MultiProducer extends Thread{
    private KafkaProducer<String,String> producer;
    private String topicName;
    public MultiProducer(String topic){
        Properties props = new Properties();
        props.put("bootstrap.servers", "bigdata01:9092,bigdata02:9092,bigdata03:9092");
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());

        producer = new KafkaProducer<>(props);
        topicName = topic;
    }
    @Override
    public void run(){
        int messageCount = 0;
        while(messageCount<100){
            producer.send(new ProducerRecord<String,String>("javatopic", Integer.toString(messageCount),"Multimessage:"+messageCount),new MyCallback());
            messageCount++;
            producer.flush();

        }

    }

    public static void main(String[] args) {
        ExecutorService es = Executors.newFixedThreadPool(3);
        for(int i=0;i<20;i++){
            es.execute(new MultiProducer("multijavatopic"));
        }
        es.shutdown();
    }
}

自定义分区器

如果不指定分区器，则默认按key的hashCode来分区，如果key为空则按分区个数轮询平均分配到各个分区。如果分区数量增加，那么相同的key前后不能保证分配到同一个分区，所以尽量创建足够的分区从不添加。
自定义分区器要实现Partitioner接口:

public interface Partitioner extends Configurable, Closeable {

    /**
     * Compute the partition for the given record.
     *
     * @param topic The topic name
     * @param key The key to partition on (or null if no key)
     * @param keyBytes The serialized key to partition on( or null if no key)
     * @param value The value to partition on or null
     * @param valueBytes The serialized value to partition on or null
     * @param cluster The current cluster metadata
     */
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster);

    /**
     * This is called when partitioner is closed.
     */
    public void close();
}

Kafka默认的分区器DefaultPartitioner实现了此接口，并且对key进行哈希取模：

...
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
//murmur2:Generates 32 bit murmur2 hash from byte array
//toPositive:return number & 0x7fffffff;只用与操作把负数转为正数

自定义分区器举例：

public class MyPartitioner implements Partitioner{
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            throw new InvalidRecordException("null key  is not allowed");
        }
        if(key.equals("1")){
            System.out.println("My Partitioner for key 1");
            return numPartitions-1;//如果key=1放入最后一个分区
        }
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;;
    }
    @Override
    public void close() {
    }
    @Override
    public void configure(Map<String, ?> configs) {

    }
}

将自定义分区器类的路径加入配置：

props.put("partitioner.class","producer.MyPartitioner");

源码分析

send()方法源码

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
        ProducerRecord<K, V> interceptedRecord = this.interceptors == null?record:this.interceptors.onSend(record);
        return this.doSend(interceptedRecord, callback);
    }

实际调用了doSend方法

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
	TopicPartition tp = null;//此对象存储了topic和partition信息，实现了hashCode、equals、toString方法
try {
    // first make sure the metadata for the topic is available
    ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
    long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
    Cluster cluster = clusterAndWaitTime.cluster;
    byte[] serializedKey;
    try {
    serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
    } catch (ClassCastException cce) {...}
    byte[] serializedValue;
    try {//序列化
    	serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
    } catch (ClassCastException cce) {...}
    //计算发送到哪个partition
    int partition = this.partition(record, serializedKey, serializedValue, cluster);
    tp = new TopicPartition(record.topic(), partition);//得到topic和partition信息
    this.setReadOnly(record.headers());
    Header[] headers = record.headers().toArray();
    int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(this.apiVersions.maxUsableProduceMagic(), this.compressionType, serializedKey, serializedValue, headers);
    this.ensureValidRecordSize(serializedSize);
    //时间戳
    long timestamp = record.timestamp() == null?this.time.milliseconds():record.timestamp().longValue();
    this.log.trace("Sending record {} with callback {} to topic {} partition {}", new Object[]{record, callback, record.topic(), Integer.valueOf(partition)});
    //拦截器与回调函数
    Callback interceptCallback = this.interceptors == null?callback:new KafkaProducer.InterceptorCallback(callback, this.interceptors, tp);
    if(this.transactionManager != null && this.transactionManager.isTransactional()) {this.transactionManager.maybeAddPartitionToTransaction(tp);}
    //核心代码
	RecordAppendResult result = this.accumulator.append(tp, timestamp, serializedKey, serializedValue, headers, (Callback)interceptCallback, remainingWaitMs);
	if(result.batchIsFull || result.newBatchCreated) {
    this.log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), Integer.valueOf(partition));
                this.sender.wakeup();
    }
    return result.future;
    }catch(){...//省略各种异常捕获
    }

this.accumulator.append函数讲消息打包到批次中，消息批次的数据结构为队列

Deque<ProducerBatch> dq = this.getOrCreateDeque(tp);

通过TopicPartition对象得到或者生成队列ArrayDeque()

private Deque<ProducerBatch> getOrCreateDeque(TopicPartition tp) {
        Deque<ProducerBatch> d = (Deque)this.batches.get(tp);
        if(d != null) {
            return d;
        } else {
            Deque<ProducerBatch> d = new ArrayDeque();
            Deque<ProducerBatch> previous = (Deque)this.batches.putIfAbsent(tp, d);
            return (Deque)(previous == null?d:previous);
        }
    }

其中this.batches是一个ConcurrentMap<TopicPartition, Deque<ProducerBatch>>，以tp为key，用get(tp)方法说明相同的topic和partition用的是同一个队列。批次满了就新建一个批次，并加入到队列中：

ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, this.time.milliseconds());
...
dq.addLast(batch);

参考资料：
http://kafka.apache.org/documentation/#producerconfigs
http://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html

Kafka编程指南之三：生产者API实战

文章目录

搭建环境

编写生产者producer.send(record);

Fire-and-forget

同步发送

异步发送

多线程模式

自定义分区器

相关参数配置

消息写入成功的判断标准acks

消息保留期

offsets.retention.minutes

log.retention.minutes

怎么修改消息默认大小?

消息发送重试次数

批次消息数量

源码分析

send()方法源码