文章目录
前一篇文章用控制台命令实践了Kafka的消息生产和消费,接下来用Java编写生产者和消费者程序。
生产者先声明一个ProducerRecord对象,包含了topic、partition、key、value信息,然后通过Send()方法发送。因为要通过网络传输,所以要经过序列化。还可以自定义分区器,下面通过例子来说明。需要注意的是,发送消息是分批的,如果没有达到批次要求,也是不会实际发送的。
搭建环境
按之前的文章搭建好Kafka服务器后,在本地用IDEA新建一个maven项目kafkatest。使用如下pom.xml文件,自动导入依赖包。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>kafkatest</groupId>
<artifactId>kafkatest</artifactId>
<version>1.0-SNAPSHOT</version>
<name>kafkatest</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>1.0.0</version>
</dependency>
</dependencies>
</project>
编写生产者producer.send(record);
在消息发送时,需要考虑应用场景:
消息不允许丢失,也不允许重复(金融业务);
允许丢失少量消息,也可以延迟,保持高吞吐(用户行为记录);
生产者调用send()函数将record,发送给broker。
三种Send方法
Properties props = new Properties();
props.put("bootstrap.servers", "bigdata01:9092,bigdata02:9092,bigdata03:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i=0; i<100; i++)
producer.send(new ProducerRecord<String, String>("javatopic", Integer.toString(i), "messager:"+Integer.toString(i)));
producer.close();//未达到批次要求大小强制发送
send()方法是异步的,把消息加入到消息队列中后立即返回,这样可以批量发送。
producer为每个partition维护了未发送消息的缓冲区,缓冲区的大小可以设置( props.put(“batch.size”, 16384);)
Fire-and-forget
不保证消息会成功发送,producer会自动重试,但还是无法保证不会丢失消息。如上述的示例。
同步发送
send()方法会返回Future对象并使用get()方法阻塞,
try {
Future<RecordMetadata> futurerm= producer.send(record);
RecordMetadata rm = futurerm.get();
long offset = rm.offset();
int partition = rm.partition();
String topic = rm.topic();
System.out.println("topic:"+topic+",partition:"+partition+",offset:"+offset);
producer.close();
} catch (Exception e) {
e.printStackTrace();
}
异步发送
send()方法加入回调函数callback
for (int i=0; i<100; i++){
try {
producer.send(new ProducerRecord<String,String>("javatopic", Integer.toString(i),"messagecallback:"+i),new MyCallback());
} catch(Exception e){
e.printStackTrace();
}
}
producer.close();
callback函数如下:
package producer;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.RecordMetadata;
public class MyCallback implements Callback{
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if(e != null){
//异常处理
e.printStackTrace();
}
else{
long offset = recordMetadata.offset();
int partition = recordMetadata.partition();
String topic = recordMetadata.topic();
System.out.println("topic:" + topic + ",partition:" + partition + ",offset:" + offset);
}
}
}
多线程模式
public class MultiProducer extends Thread{
private KafkaProducer<String,String> producer;
private String topicName;
public MultiProducer(String topic){
Properties props = new Properties();
props.put("bootstrap.servers", "bigdata01:9092,bigdata02:9092,bigdata03:9092");
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
producer = new KafkaProducer<>(props);
topicName = topic;
}
@Override
public void run(){
int messageCount = 0;
while(messageCount<100){
producer.send(new ProducerRecord<String,String>("javatopic", Integer.toString(messageCount),"Multimessage:"+messageCount),new MyCallback());
messageCount++;
producer.flush();
}
}
public static void main(String[] args) {
ExecutorService es = Executors.newFixedThreadPool(3);
for(int i=0;i<20;i++){
es.execute(new MultiProducer("multijavatopic"));
}
es.shutdown();
}
}
自定义分区器
如果不指定分区器,则默认按key的hashCode来分区,如果key为空则按分区个数轮询平均分配到各个分区。如果分区数量增加,那么相同的key前后不能保证分配到同一个分区,所以尽量创建足够的分区从不添加。
自定义分区器要实现Partitioner接口:
public interface Partitioner extends Configurable, Closeable {
/**
* Compute the partition for the given record.
*
* @param topic The topic name
* @param key The key to partition on (or null if no key)
* @param keyBytes The serialized key to partition on( or null if no key)
* @param value The value to partition on or null
* @param valueBytes The serialized value to partition on or null
* @param cluster The current cluster metadata
*/
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster);
/**
* This is called when partitioner is closed.
*/
public void close();
}
Kafka默认的分区器DefaultPartitioner实现了此接口,并且对key进行哈希取模:
...
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
//murmur2:Generates 32 bit murmur2 hash from byte array
//toPositive:return number & 0x7fffffff;只用与操作把负数转为正数
自定义分区器举例:
public class MyPartitioner implements Partitioner{
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
throw new InvalidRecordException("null key is not allowed");
}
if(key.equals("1")){
System.out.println("My Partitioner for key 1");
return numPartitions-1;//如果key=1放入最后一个分区
}
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;;
}
@Override
public void close() {
}
@Override
public void configure(Map<String, ?> configs) {
}
}
将自定义分区器类的路径加入配置:
props.put("partitioner.class","producer.MyPartitioner");
相关参数配置
在Kafka官网可以查看相关配置说明,进入http://kafka.apache.org点击左边DOCUMENTATION,搜索producer configs可以定位到参数说明位置。
消息写入成功的判断标准acks
acks是消息请求完成的判断标准:收到的ack的数量(The number of acknowledgments the producer requires the leader to have received before considering a request complete)。
acks可以取四个值:0、1、all、-1(按此顺序要求越来越严格)。比如0代表生产者只要发送就认为成功了,不等待broker回复,吞吐量最高;1代表消息发送到leader并返回成功后即为成功,此时的吞吐量取决于消息是同步发送还是异步发送;all和-1作用相同,代表发送到broker的所有副本都返回成功才认为消息发送成功,这是最安全的也是最严格的模式。即使一个broker崩溃,由于存在副本,消息也不会丢失。
上述Java代码中可以这样设置,注意类型是String类型,不是Int型。
props.put("acks", "1");
消息保留期
在broker的server.properties配置文件中配置
offsets.retention.minutes
offsets.retention.minutes参数是个int型,默认值是10080(
2.0.0版本将默认的offset retention time从1天改为7天,即10080分钟),表示消息偏移量的过期时间,这点跟消费者的消费位置密切相关。
log.retention.minutes
topic的保存时间。建议offsets.retention.minutes设置的时间大于 log.retention.minutes时间,否则topic没有过期,但是偏移量已经过期,无法正常读取topic。
怎么修改消息默认大小?
消息最大发送的大小设置max.partition.fetch.bytes,默认为1048576,可以修改:
props.put("max.partition.fetch.bytes","10485760")"
消息发送重试次数
props.put("message.send.max.retries","3");
批次消息数量
props.put("batch.num.message","200");
源码分析
send()方法源码
public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
ProducerRecord<K, V> interceptedRecord = this.interceptors == null?record:this.interceptors.onSend(record);
return this.doSend(interceptedRecord, callback);
}
实际调用了doSend方法
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;//此对象存储了topic和partition信息,实现了hashCode、equals、toString方法
try {
// first make sure the metadata for the topic is available
ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
Cluster cluster = clusterAndWaitTime.cluster;
byte[] serializedKey;
try {
serializedKey = keySerializer.serialize(record.topic(), record.headers(), record.key());
} catch (ClassCastException cce) {...}
byte[] serializedValue;
try {//序列化
serializedValue = valueSerializer.serialize(record.topic(), record.headers(), record.value());
} catch (ClassCastException cce) {...}
//计算发送到哪个partition
int partition = this.partition(record, serializedKey, serializedValue, cluster);
tp = new TopicPartition(record.topic(), partition);//得到topic和partition信息
this.setReadOnly(record.headers());
Header[] headers = record.headers().toArray();
int serializedSize = AbstractRecords.estimateSizeInBytesUpperBound(this.apiVersions.maxUsableProduceMagic(), this.compressionType, serializedKey, serializedValue, headers);
this.ensureValidRecordSize(serializedSize);
//时间戳
long timestamp = record.timestamp() == null?this.time.milliseconds():record.timestamp().longValue();
this.log.trace("Sending record {} with callback {} to topic {} partition {}", new Object[]{record, callback, record.topic(), Integer.valueOf(partition)});
//拦截器与回调函数
Callback interceptCallback = this.interceptors == null?callback:new KafkaProducer.InterceptorCallback(callback, this.interceptors, tp);
if(this.transactionManager != null && this.transactionManager.isTransactional()) {this.transactionManager.maybeAddPartitionToTransaction(tp);}
//核心代码
RecordAppendResult result = this.accumulator.append(tp, timestamp, serializedKey, serializedValue, headers, (Callback)interceptCallback, remainingWaitMs);
if(result.batchIsFull || result.newBatchCreated) {
this.log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), Integer.valueOf(partition));
this.sender.wakeup();
}
return result.future;
}catch(){...//省略各种异常捕获
}
this.accumulator.append函数讲消息打包到批次中,消息批次的数据结构为队列
Deque<ProducerBatch> dq = this.getOrCreateDeque(tp);
通过TopicPartition对象得到或者生成队列ArrayDeque()
private Deque<ProducerBatch> getOrCreateDeque(TopicPartition tp) {
Deque<ProducerBatch> d = (Deque)this.batches.get(tp);
if(d != null) {
return d;
} else {
Deque<ProducerBatch> d = new ArrayDeque();
Deque<ProducerBatch> previous = (Deque)this.batches.putIfAbsent(tp, d);
return (Deque)(previous == null?d:previous);
}
}
其中this.batches是一个ConcurrentMap<TopicPartition, Deque<ProducerBatch>>,以tp为key,用get(tp)方法说明相同的topic和partition用的是同一个队列。批次满了就新建一个批次,并加入到队列中:
ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, this.time.milliseconds());
...
dq.addLast(batch);
参考资料:
http://kafka.apache.org/documentation/#producerconfigs
http://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html