producer 发布消息
1、内容
将数据转化为“消息”,并通过网络发送给Broker,每条数据被称为“消息”,每条信息表示为一个三元组,
即<topic,key,message>
2、
3、写入方式
Producer 通过 Push 的方式将消息推送到Broker,Broker需要将消息持久化到磁盘而非内存中,这就必须采用高效地数据写入和存储方式,因此Broker将收到的每条消息 append 到 patition 中,并结合基于offset的数据组织方式(属于顺序写磁盘,这就是顺序写磁盘效率比随机写内存要高,以此来保障 kafka 吞吐率的原因)。
测试证明,相同环境下,顺序写速度为600MB/s,随机写速度为100KB/s。
4、消息路由
producer 发送消息到 broker 时,会根据分区算法选择将其存储到哪一个 partition。其路由机制为:
1. 指定了 patition,则直接使用;
2. 未指定 patition 但指定 key,通过对 key 的 value 进行hash 选出一个 patition;
3. patition 和 key 都未指定,使用轮询选出一个 patition。
附代码实现:
//创建消息实例
public ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value) {
if (topic == null)
throw new IllegalArgumentException("Topic cannot be null");
if (timestamp != null && timestamp < 0)
throw new IllegalArgumentException("Invalid timestamp " + timestamp);
this.topic = topic;
this.partition = partition;
this.key = key;
this.value = value;
this.timestamp = timestamp;
}
//计算 patition,如果指定了 patition 则直接使用,否则使用 key 计算
private int partition(ProducerRecord<K, V> record, byte[] serializedKey , byte[] serializedValue, Cluster cluster) {
Integer partition = record.partition();
if (partition != null) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(record.topic());
int lastPartition = partitions.size() - 1;
if (partition < 0 || partition > lastPartition) {
throw new IllegalArgumentException(String.format("Invalid partition given with record: %d is not in the range [0...%d].", partition, lastPartition));
}
return partition;
}
return this.partitioner.partition(record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
}
// 使用 key 选取 patition
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = counter.getAndIncrement();
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
return DefaultPartitioner.toPositive(nextValue) % numPartitions;
}
} else {
//对 keyBytes 进行 hash 选出一个 patition
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = counter.getAndIncrement();
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
return DefaultPartitioner.toPositive(nextValue) % numPartitions;
}
} else {
//对 keyBytes 进行 hash 选出一个 patition
return DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}