概述
Kafka是一种消息中间件。
Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Kafka®用于建立实时数据管道和流媒体应用程序。
它具有水平扩展能力、容错能力、运行速度快,并且可以在数千家公司的产品中运行。
流媒体平台有三个关键功能:
- 发布和订阅记录流,类似于消息队列或企业消息传递系统。
- 以容错的持久方式存储记录流。
- 当记录发生时处理它们的流。
Kafka通常用于两大类应用程序:
- 构建实时流数据管道,在系统或应用程序之间可靠地获取数据
- 构建转换或响应数据流的实时流应用程序
概念
- producer:生产者
- consumer:消费者
- broker:中间件
- topic:主题,指向不同消费者的标签
架构
- Producer API允许应用程序将记录流发布到一个或多个Kafka主题。
- Consumer API允许应用程序订阅一个或多个主题,并处理产生给它们的记录流。
- Streams API允许应用程序充当流处理器,使用一个或多个主题的输入流,并生成一个或多个输出主题的输出流,从而有效地将输入流转换为输出流。
- Connector API允许构建和运行将Kafka主题连接到现有应用程序或数据系统的可重用生产者或消费者。例如,到关系数据库的连接器可能捕获对表的每个更改。
配置
Kafka环境前提为安装对应的Zookeeper
Kafka配置关键在于$KAFKA_PATH/conf/server.properties:
- broker.id -> 中间件编号(唯一)
- listeners -> 监听端口号
- host.name -> 当前机器名
- log.dirs ->存放Kafka日志的文件
- zookeeper.connect -> Zookeeper地址
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
listeners=PLAINTEXT://:9092
# The port the socket server listens on
#port=9092
# Hostname the broker will bind to. If not set, the server will bind to all interfaces
host.name=hadoop000
# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured. Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>
# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop000:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
启动
启动Zookeeper的server:
./zkServer.sh start
启动Kafka:
kafka-server-start.sh $KAFKA_HOME/config/server.properties
单节点单broker部署
创建一个topic(副本系数1,分区数1):
kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 1 --partitions 1 -topic tp_hello_topic
查看所有topic:
kafka-topics.sh --list --zookeeper hadoop000:2181
生产消息:
kafka-console-producer.sh --broker-list hadoop000:9092 --topic tp_hello_topic
消费消息:
kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_hello_topic --from-beginning
在生产消息端发送消息,消费消息端能够接收到。
创建、查看、消费时,是Zookeeper的地址,生产时,是broker的地址。
单节点多broker部署
设置配置文件server-1.properties、server-2.properties、server-3.properties,更改内容为:
- broker.id ->1/2/3
- listeners ->9093/9094/9095端口
- log.dirs ->kafka-logs-1/kafka-logs-2/kafka-logs-3
启动三个broker:
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties &
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties &
创建topic:
kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 3 --partitions 1 -topic tp_replicated_topic
查看详细信息:
kafka-topics.sh --describe --zookeeper hadoop000:2181 --topic tp_replicated_topic
Topic:tp_replicated_topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: tp_replicated_topic Partition: 0 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
- Leader:哪个是领导broker
- Replicas:副本在哪几个broker
- Isr:现在存活的broker
生产消息:
kafka-console-producer.sh --broker-list hadoop000:9093,hadoop000:9094,hadoop000:9095 --topic tp_replicated_topic
消费消息:
kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_replicated_topic
在生产消息端发送消息,消费消息端能够接收到。
IDEA环境
pom.xml:
<properties>
<scala.version>2.11.8</scala.version>
<kafka.version>0.9.0.0</kafka.version>
</properties>
Kafka依赖:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>${kafka.version}</version>
</dependency>
JAVA API - 生产者
这里注意先去服务器的Kafka里把配置设置一下,在service.properties里加上两个参数:
advertised.host.name -> 写本机IP地址(192.168.1.9)
advertised.port -> 写端口号(9092)
不然之后可能会报错:
Exception in thread "Thread-0" kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
回到主题,代码思路是写Properties->建Producer->新建线程发送信息->run起来->去Consumer查看结果
新建KafkaProperties.java,指定Zookeeper地址、topic的ID、broker的地址:
package com.taipark.spark.kafka;
public class KafkaProperties {
public static final String ZK = "192.168.1.9:2181";
public static final String TOPID = "tp_hello_topic";
public static final String BROKER_LIST = "192.168.1.9:9092";
}
新建kafkaProducer.java:
package com.taipark.spark.kafka;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Properties;
public class KafkaProducer extends Thread{
private String topic;
private Producer<Integer,String> producer;
public KafkaProducer(String topic){
this.topic = topic;
Properties properties = new Properties();
properties.put("metadata.broker.list",KafkaProperties.BROKER_LIST);
properties.put("serializer.class","kafka.serializer.StringEncoder");
// request.required.acks = 1 - means the leader will write the message to its local log and immediately acknowledge
properties.put("request.required.acks","1");
producer = new Producer<Integer, String>(new ProducerConfig(properties));
}
@Override
public void run() {
int messageNo = 1;
while(true){
String message = "message_" + messageNo;
producer.send(new KeyedMessage<Integer, String>(topic,message));
System.out.println("Sent:" + message);
messageNo++;
try{
Thread.sleep(2000);
}catch (Exception e){
e.printStackTrace();
}
}
}
}
新建KafkaClientApp.java测试:
package com.taipark.spark.kafka;
public class KafkaClientApp {
public static void main(String[] args) {
new KafkaProducer(KafkaProperties.TOPID).start();
}
}
服务器端把消费者启起来:
kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_hello_topic
IDEA里run一下:
OK~
Java API - 消费者
新建KafkaConsumer.java:
package com.taipark.spark.kafka;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
public class KafkaConsumer extends Thread{
private String topic;
public KafkaConsumer(String topic){
this.topic =topic;
}
private ConsumerConnector createConnector(){
Properties properties = new Properties();
properties.put("zookeeper.connect",KafkaProperties.ZK);
properties.put("group.id",KafkaProperties.GROUP_ID);
return Consumer.createJavaConsumerConnector(new ConsumerConfig(properties));
}
@Override
public void run() {
ConsumerConnector consumer = createConnector();
Map<String,Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic,1);
//String:topic
//List<KafkaStream<byte[], byte[]>>:对应的数据流
Map<String, List<KafkaStream<byte[], byte[]>>> messageStream = consumer.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = messageStream.get(topic).get(0); //get(0):获取我们每次接受到的数据
ConsumerIterator<byte[], byte[]> iterator= stream.iterator();
while (iterator.hasNext()){
String message = new String(iterator.next().message());
System.out.println("rec:" + message);
}
}
}
KafkaProperties.java里加上一个GROUP_ID:
public static final String GROUP_ID = "test_group_1";
去run一下:
package com.taipark.spark.kafka;
public class KafkaClientApp {
public static void main(String[] args) {
new KafkaProducer(KafkaProperties.TOPID).start();
new KafkaConsumer(KafkaProperties.TOPID).start();
}
}
Done~