文章目录
1. Kafka原理
1.1 Kafka组件
1.2 Kafka集群
1.3 选举
1.4 Broker实现的功能
1.5 有了集群后的一些原理
1.5.1 启动服务
1.5.2 创建主题
1.5.3 生产数据
1.5.4 消费数据
2. Kafka基础操作
2.0 Kafka界面操作工具
2.1 topic相关操作
2.1.1 查看当前已有的topic
.\kafka-topics.bat --bootstrap-server localhost:9092 --list
2.1.2 创建topic
.\kafka-topics.bat --bootstrap-server localhost:9092 --topic sk_test --create
2.1.3 查看一个具体的topic
.\kafka-topics.bat --bootstrap-server localhost:9092 --topic sk_test --describe
2.1.4 修改topic
2.1.5 删除topic
在windows系统上, 删除topic会存在一些问题,会强制关闭Kafka进行。 因此删除操作应该尽可能在Linux系统上面去操作完成。
2.1.6 查看每一个topic积压的数据
PS D:\kafka_2.12-3.6.1\bin\windows> .\kafka-consumer-groups.bat --bootstrap-server iotmq-node01:2909 --describe --group iot
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
iot shuiku-H34-topic 0 - 655103 - consumer-iot-1-786bc4e3-8da8-4d77-9125-94822855f25c /64.97.209.75 consumer-iot-1
iot shuiku-prod-H66-topic 0 4539328 4539328 0 consumer-iot-1-ffe66b70-c353-4d8b-b4da-ef3460d30ad4 /64.97.209.85 consumer-iot-1
iot shuiku-prod-zc-topic 0 370 370 0 consumer-iot-2-415cc450-710b-4f60-a460-03e79f5fecce /64.97.209.85 consumer-iot-2
iot shuiku-H32-topic 0 - 2311150 - consumer-iot-4-67cd2688-4d25-4395-a119-babeee9ad247 /64.97.209.75 consumer-iot-4
iot shuiku-H66-topic 0 - 1 - consumer-iot-3-bc5c9c15-84ee-471f-a080-d3c428b7a4fb /64.97.209.75 consumer-iot-3
iot shuiku-prod-H34-topic 0 49846792 49846793 1 consumer-iot-1-6738f854-055f-41fe-9d1e-cf980cbffdb3 /64.97.209.75 consumer-iot-1
iot shuiku-H33-topic 0 - 4049756 - consumer-iot-5-1513db72-7e1d-463b-a481-345513b46f33 /64.97.209.75 consumer-iot-5
iot shuiku-prod-H36-topic 0 29259634 29259634 0 consumer-iot-1-45c18d08-1ad5-4954-acbb-52f9057e2128 /64.97.209.75 consumer-iot-1
iot shuiku-prod-H33-topic 0 76083088 76083094 6 consumer-iot-4-440acb7a-c0db-4572-8a82-9d8df450be62 /64.97.209.85 consumer-iot-4
iot shuiku-prod-H32-topic 0 174950529 175155866 205337 consumer-iot-3-bd8dee10-4388-4c82-bd2a-9e5abe87ec71 /64.97.209.85 consumer-iot-3
iot shuiku-H36-topic 0 - 132537 - consumer-iot-2-87363eb1-0179-4918-839c-11882e1c5570 /64.97.209.75 consumer-iot-2
2.1.7 查看某一个topic的实时数据
[root@slt-lt-paas11 bin]# ./kafka-console-consumer.sh --topic dujiatai-rainfall-prod-topic --bootstrap-server 10.10.2.112:10321
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717237,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":10152,"deviceName":"港洲外垸","fiveData":0.0,"hour_precipitation":null,"id":242999054,"mnNo":"4200220011","obDate":1749716700000,"obTime":"2506121625","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7074,"stationName":"港洲外垸","status":1,"supplyVoltage":14.2,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717209,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":6173,"deviceName":"702号闸(闸后)","fiveData":0.0,"hour_precipitation":null,"id":242998988,"mnNo":"4200016102","obDate":1749716700000,"obTime":"2506121625","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7018,"stationName":"702号闸","status":1,"supplyVoltage":13.5,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717164,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":10156,"deviceName":"香城垸","fiveData":0.0,"hour_precipitation":null,"id":242998862,"mnNo":"4200220015","obDate":1749716700000,"obTime":"2506121625","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7078,"stationName":"香城垸","status":1,"supplyVoltage":13.7,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717129,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":6175,"deviceName":"南屏闸(闸后)","fiveData":0.0,"hour_precipitation":null,"id":242998795,"mnNo":"4200016105","obDate":1749717000000,"obTime":"2506121630","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7021,"stationName":"南屏闸","status":1,"supplyVoltage":14.0,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717097,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":10152,"deviceName":"港洲外垸","fiveData":0.0,"hour_precipitation":null,"id":242998732,"mnNo":"4200220011","obDate":1749717000000,"obTime":"2506121630","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7074,"stationName":"港洲外垸","status":1,"supplyVoltage":14.2,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717092,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":10158,"deviceName":"荒五里垸","fiveData":0.0,"hour_precipitation":null,"id":242998723,"mnNo":"4200220017","obDate":1749717000000,"obTime":"2506121630","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7080,"stationName":"荒五里垸","status":1,"supplyVoltage":13.6,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717068,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":10142,"deviceName":"新农垸","fiveData":0.0,"hour_precipitation":null,"id":242998658,"mnNo":"4200220001","obDate":1749717000000,"obTime":"2506121630","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7064,"stationName":"新农垸","status":1,"supplyVoltage":14.1,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717059,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":6173,"deviceName":"702号闸(闸后)","fiveData":0.0,"hour_precipitation":null,"id":242998612,"mnNo":"4200016102","obDate":1749717000000,"obTime":"2506121630","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7018,"stationName":"702号闸","status":1,"supplyVoltage":13.5,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
{"accumulatedPrecipitation":0.0,"alarm":0,"createdAt":1749717023,"createdUser":0,"createdUserName":"","currentPrecipitation":0.0,"dayPrecipitation":0.0,"deviceId":10156,"deviceName":"香城垸","fiveData":0.0,"hour_precipitation":null,"id":242997958,"mnNo":"4200220015","obDate":1749717000000,"obTime":"2506121630","receiveType":0,"sendType":51,"signal_quality":null,"stationId":7078,"stationName":"香城垸","status":1,"supplyVoltage":13.7,"updatedAt":0,"updatedUser":0,"updatedUserName":"","waterLevel":-30.0,"waterLevelReal":0.0}
2.2 生产者-消费者
生产者
.\kafka-console-producer.bat --bootstrap-server localhost:9092 --topic sk_test
消费者
.\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic sk_test
2.2.1 Java代码实现生产者和消费者
2.2.1.1 生产者
package org.example.producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
public class KafkaProducerClient {
public static void main(String[] args) {
// 1. 创建生产者对象
Map<String, Object> configMap = new HashMap<>();
configMap.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configMap.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
configMap.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(configMap);
// // 2. 创建数据
// ProducerRecord<String, String> record = new ProducerRecord<>("sk_test", "key", "value");
// // 3. 通过生产者对象将数据发送到Kafka
// producer.send(record);
for (int i = 0; i < 10; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>("sk_test", "key" + i, "value" + i);
producer.send(record);
}
// 4. 关闭生产者对象
producer.close();
}
}
2.2.1.2 消费者
package org.example.consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
public class KafkaConsumerClient {
public static void main(String[] args) {
// 1. 创建消费者对象
Map<String, Object> consumerConfig = new HashMap<>();
consumerConfig.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
consumerConfig.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerConfig.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerConfig.put(ConsumerConfig.GROUP_ID_CONFIG, "sk_group");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(consumerConfig);
kafkaConsumer.subscribe(Collections.singletonList("sk_test"));
// 2. 从Kafka的主题中获取对象
// ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
// for (ConsumerRecord record : records) {
// System.out.println(record);
// }
while(true) {
final ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord record : records) {
System.out.println(record);
}
}
// 3. 关闭消费者对象
//kafkaConsumer.close();
}
}
2 Borker的基本模块
2.1 SocketServer
SocketServer主要包括:
(1)Acceptor:主要用于监听socket连接
(2)Processor:主要用于转发Socket的请求和响应
(3)RequestChannel:主要用于缓存Socket的请求和响应
2.1.1 Acceptor的初始化过程:
(1)开启socket服务
(2)注册Accept事件
(3)监听此ServerChannel上的ACCEPT事件,当其发生时,将以轮询的方式把对应的SocketChannel转交给Processor处理线程
2.1.2 Processor的初始化过程:
(1)当有新的SocketChannel对象进来的时候,注册其上的OP_READ事件以便接收客户端的请求
(2)从RequestChannel中的响应队列获取对应客户端请求的响应,然后产生OP_WRITE事件。
(3)监听selector上的事件。
如果是读事件,说明有新的request到来,需要转移给RequestChannel的请求队列;
如果是写事件,说明之前的request已经处理完毕,需要从RequestChannel的响应队列中获取响应并发送回客户端;
如果是关闭事件,说明客户端已经关闭了该Socket连接,此时服务端也应该释放响应的资源。
2.1.3 RequestChannel
RequestChannel内部包含1个Request阻塞队列和numProcessors(默认为3个)个Response阻塞队列。
Processor线程通过监听OP_READ事件将Request转移到RequestChannel内部的Request阻塞队列。KafKaRequestHandlerPool内部的KafkaRequestHandler线程从RequestChannel内部的Request阻塞队列中取出Request进行处理,然后将对应的Response放回至RequestChannel内部的Response阻塞队列,并触发Processor线程监听的OP_WRITE事件,最后由Processor线程将Response发送至客户端。
2.2 KafkaRequestHandlerPool
本质就是一个线程池。
KafkaRequestHandlerPool在内部启动了若干KafkaRequestHandler处理线程,并将RequestChannel对象和KafkaApis对象传递给了KafkaRequestHandler处理线程,应为KafkaRequestHandler需要从前者的requestQueue中取出Request,并且要利用后者完成具体的业务逻辑。
KafkaRequestHandler的主要步骤如下:
(1)调用requestChannel.receiveReuqest从RequestChannel的Request阻塞队列中获取请求,如果获取不到,则一直在while循环中不断尝试,直至获取到新的请求为止。
(2)判断请求的类型:
如果是RequestChannel.AllDone类型的话,则退出线程;
否则调用apis完成请求的处理,并由apis负责将对应的响应放入reqeustChannel的Response阻塞队列
因此,SocketServer的Processor线程和KafkaRequestHandlerPool的KafkaRequestHandler线程利用SocketServer 的RequestChannel实现了一个简单的生产者消费者模型。
2.3 KafkaApis
KafkaApis负责具体的业务逻辑。
KafkaApis主要依赖以下四个组建完成具体的业务逻辑:
(1)LogManager:日志的读取和写入
(2)ReplicaManager:Topic的分区副本的同步功能
(3)OffsetManager:便宜量的管理功能
(4)KafkaScheduler:定时任务的调度和管理功能
2.3.1 LogManager
2.3.1.1 Kafka的日志组成
LogManager利用logs来管理Broker Server内部的日志。
private val logs = new Pool[TopicAndPartition,log]()
通过TopicAndPartition来索引不同Topic的不同Patition数据。
Log利用segments来管理Partition数据,里面包含多个日志段,即LogSegment。
2.3.1.2 Kafka的消息读取
(1)根据startOffset定位位于哪个LogSegment。segments的数据结构为ConcurrentSkipListMap(是跳表数据机构)
(2)读数据
<1> 通过startOffset找到位于log中的具体的物理位置,以字节为单位
<1.1> 通过二分法找到小于等于startOffset的最近索引记录位置
<1.2> 打开log文件,返回真实的物理偏移量
<2> 组装offset元数据信息
<3> 如果设置了maxOffset,则根据其具体值计算实际需要读取的字节数
<4> 通过FileMessageSet提供的指定物理便宜量和长度的read方法读取相应的数据
2.3.1.3 LogManager的启动
(1)初始化logs
Broker Server的每个日志目录下都有一个recovery-point-offset-checkpoint文件。这个文件记录了该目录下所有日志文件的状态,即TopicAndPartition和Offset的对应关系。通过这个文件来初始化logs,也就是PoolTopicAndPartition,Log对象。
(2)开启后台定时任务和维护线程
<1> cleanupLogs:负责删除过时的和冗余的数据
删除数据的最小单位是LogSegment。Kafka通过配置时间和大小来实现消息的循环覆盖功能。
<2> flushDirtyLogs:定时刷新脏数据,刷新数据的最小单位是LogSegment(LogSegment里面包含数据文件和索引文件)
<3>checkpointRecoveryPointOffsets:定时刷新logs包含的元数据至recovery-point-offset-checkpoint文件,即TopicAndPartition和偏移量。
2.3.2 ReplicaManager
AR(Assign Repicas):已经分配给Patition的副本
ISR(In-Sync Replicas):处于同步状态的副本
ReplicaFetcherThread(数据副本拉取线程)是位于Broker Server上的线程。单个ReplicaFetcherThread只负责某个Broker Server上的部分TopicAndPartition的Replica的同步,单个ReplicaFetcherThread不会负责跨多个Broker Server上的TopicAndPartition的Replica同步。
HighWatermark(高水位线):本质上代表的是ISR中所有replicas的last commited message的最小起始便宜量,即在这偏移之前的数据都被ISR中的replicas接收,但是在这偏移之后的数据被ISR中的部分replicas所接收。
becomeLeaderOrFollower处理流程
当Broker Server被分配Replica的时候,该Replica有可能成为Leader状态的Replica或者Follower的状态的Replica,也有可能发生两者之间的切换,此时就会进入becomeLeaderOrFollower处理流程。
(1)判断request的时效性,如果request的内部时钟小于当前的时钟,则这条request就已经过时了。
(2)如果当前的leader时钟小于请求的leader时钟,则说明当前的状态是有效的
(3) 筛选出Broker Server上即将成为Leader 的Replica
(4) 筛选出该Broker Server上即将成为Follower 的Replica
(5) 进入成为Leader的流程;
进入成为Follwer的流程;
(6) 开启highwatermark-checkpoint线程,该线程负责将HighWatermark刷新
(7) 关闭空闲的Fetch线程。
makeLeaders:成为Leader的流程
(1) 如果是leader的话,删除针对该Replica的Fetch请求
(2) 尝试生成AR(assign replicas)的结构化信息,不存在则生成,存在则获取
(3) 删除已经不存在的Replica
(4) 初始化Leader状态的Replica的LogOffsetMetadata;
初始化Follower状态的Replica的LogOffsetMetadata;
(5) 初始化Leader状态的Replica的和Watermark
makeFollowers:
5 Kafka优化
5.1 资源配置
5.1.1 操作系统
Kafka网络客户端底层使用Java NIO的Selector方式。 Selector在Linux的实现是epoll,在Windows上的实现机制是select。因此Kafka部署在Linux上会有更高的性能。
数据在磁盘和网络上进行传输时,Linux可以通过零拷贝机制提升性能,而Windows会在一定程度上使用零拷贝机制。
综上,建议Kafka部署在Linux系统上。
5.1.2 硬盘选择
Kafka的IO位顺序读写的IO。此种IO下,磁盘和固态盘没有很大的性能差异。
Kafka自身已经具有冗余机制,并且通过分区的设计实现了负载均衡的功能。因此磁盘无需做RAID阵列。
5.1.2.1 磁盘空间需求估算
5.1.3 网络带宽
5.1.4 内存配置
5.1.5 CPU选择
5.2 集群容错
5.2.1 副本分配策略
5.2.2 故障转移方案
5.2.3 数据备份与恢复
5.3 参数配置优化
5.4 数据压缩和批量发送