1.Apache kafka的概述
Apathe Kafka是一个流式分布的数据平台,代表三层含义:
1.Publish/Subscribe:消息队列系统 MQ(Message Queue)
2.Process:流数据的实时处理
3.Store流数据会以一种安全的、容错冗余的存储机制存放到分布式集群中
kafka的优缺点:
优点:
- 1、支持多生产多消费(多个生产者,多个小消费者)
- 2、支持broker的横向扩展
- 3、kafka的副本机制,保证数据的不丢失。
- 4、高性能处理消息,秒级别的延迟(分区读写、顺序写入、零拷贝)
缺点: - 1、数据批量发送,并未实现真正的实时
- 2、支持统一分区内消息的有序性,无法保证全局消息有序。
- 3、对数据的监控不完善,需要监控插件
- 4、依赖zookeeper对元数据的管理
架构图
核心概念
1.Broker : kafka的服务实例,唯一序号
2.Topic (主题):某一分类的消息集合
3.Partition(分区):一个Topic中有若干个分区构成,多个分区提供了数据的分布式存储和处理的负载均衡
Replication-factor(复制因子):包含Leader分区本身的分区冗余数量
Offset(偏移量):分区内数量的位置标识
Record(数据或记录):key/value/timestamp
kafka中的生产者和消费者
生产者(Producer)
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
import java.util.UUID;
//生产者
public class Producer {
public static void main(String[] args) {
//设置生产者配置信息
Properties properties = new Properties(); properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,“HadoopNode01:9092,HadoopNode02:9092,HadoopNode03:9092”);
//String序列化(k/v)
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class);
//创建kafka生产者对象
KafkaProducer<String, String> producer = new KafkaProducer<String,String>(properties);
//生产者记录并将其发布
ProducerRecord<String, String> record = new ProducerRecord<>(“t2”, UUID.randomUUID().toString(), “Hello Word”);
producer.send(record);
//释放资源
producer.flush();
producer.close();
}
public static void producerDeamo(String user){
//设置生产者配置信息
Properties properties = new Properties(); properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,“HadoopNode01:9092,HadoopNode02:9092,HadoopNode03:9092”);
//String序列化(k/v)
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class);
//创建kafka生产者对象
KafkaProducer<String, String> producer = new KafkaProducer<String,String>(properties);
//生产者记录并将其发布
ProducerRecord<String, String> record = new ProducerRecord<>(“t2”, UUID.randomUUID().toString(), user);
System.out.println(user);
producer.send(record);
//释放资源
producer.flush();
producer.close();
}
}
消费者(Consumer)
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;
public static void main(String[] args) {
//指定kafka消费者的配置信息
Properties properties = new Properties(); properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,“HadoopNode01:9092,HadoopNode02:9092,HadoopNode03:9092”);
//关闭消费位置offset的自动提交
// properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,true);
//修改默认偏移量 properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,“earliest”);
//开启手动提交
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
//自动提交的前提下,每隔5秒提交一次 properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,5000);
//反序列化器byte[] —>object
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class);
//消费者必须指定
properties.put(ConsumerConfig.GROUP_ID_CONFIG,“group1”);
//创建kafka消费者对象
KafkaConsumer<String, String> Consumer = new KafkaConsumer<String, String>(properties);
//订阅主题topic
Consumer.subscribe(Arrays.asList(“t4”));
//手动提交消费位置
Consumer.commitSync();
//拉取新产生的记录
while (true){ //参数: 10秒钟拉取一次
ConsumerRecords<String, String> records = Consumer.poll(Duration.ofSeconds(50));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.key()+"\t"+record.value()+"\t"+record.topic()
+"\t"+record.offset()+"\t"+record.timestamp()+"\t"+record.partition());
}
}
}