Kafka入门指南

最新推荐文章于 2024-10-04 15:33:00 发布

Tai_Park

最新推荐文章于 2024-10-04 15:33:00 发布

阅读量514

点赞数

分类专栏： Spark和他的小伙伴们文章标签： kafka 大数据中间件

本文链接：https://blog.youkuaiyun.com/qq_36329973/article/details/104608086

版权

Spark和他的小伙伴们专栏收录该内容

10 篇文章

订阅专栏

概述

Kafka是一种消息中间件。

Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

Kafka®用于建立实时数据管道和流媒体应用程序。
它具有水平扩展能力、容错能力、运行速度快，并且可以在数千家公司的产品中运行。

流媒体平台有三个关键功能:

发布和订阅记录流，类似于消息队列或企业消息传递系统。
以容错的持久方式存储记录流。
当记录发生时处理它们的流。

Kafka通常用于两大类应用程序:

构建实时流数据管道，在系统或应用程序之间可靠地获取数据
构建转换或响应数据流的实时流应用程序

概念

producer：生产者
consumer：消费者
broker：中间件
topic：主题，指向不同消费者的标签

架构

Producer API允许应用程序将记录流发布到一个或多个Kafka主题。
Consumer API允许应用程序订阅一个或多个主题，并处理产生给它们的记录流。
Streams API允许应用程序充当流处理器，使用一个或多个主题的输入流，并生成一个或多个输出主题的输出流，从而有效地将输入流转换为输出流。
Connector API允许构建和运行将Kafka主题连接到现有应用程序或数据系统的可重用生产者或消费者。例如，到关系数据库的连接器可能捕获对表的每个更改。

配置

Kafka环境前提为安装对应的Zookeeper

Kafka配置关键在于$KAFKA_PATH/conf/server.properties：

broker.id -> 中间件编号（唯一）
listeners -> 监听端口号
host.name -> 当前机器名
log.dirs ->存放Kafka日志的文件
zookeeper.connect -> Zookeeper地址

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

############################# Socket Server Settings #############################

listeners=PLAINTEXT://:9092

# The port the socket server listens on
#port=9092

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
host.name=hadoop000

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured.  Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>

# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop000:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

启动

启动Zookeeper的server：

./zkServer.sh start

启动Kafka：

kafka-server-start.sh $KAFKA_HOME/config/server.properties

单节点单broker部署

创建一个topic（副本系数1，分区数1）：

kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 1 --partitions 1 -topic tp_hello_topic

查看所有topic：

kafka-topics.sh --list --zookeeper hadoop000:2181

生产消息：

kafka-console-producer.sh --broker-list hadoop000:9092 --topic tp_hello_topic

消费消息：

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_hello_topic --from-beginning

在生产消息端发送消息，消费消息端能够接收到。

创建、查看、消费时，是Zookeeper的地址，生产时，是broker的地址。

单节点多broker部署

设置配置文件server-1.properties、server-2.properties、server-3.properties，更改内容为：

broker.id ->1/2/3
listeners ->9093/9094/9095端口
log.dirs ->kafka-logs-1/kafka-logs-2/kafka-logs-3

启动三个broker：

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties & 
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties & 
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties &

创建topic：

kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 3 --partitions 1 -topic tp_replicated_topic

查看详细信息：

kafka-topics.sh --describe --zookeeper hadoop000:2181 --topic tp_replicated_topic

Topic:tp_replicated_topic	PartitionCount:1	ReplicationFactor:3	Configs:
	Topic: tp_replicated_topic	Partition: 0	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1

Leader：哪个是领导broker
Replicas：副本在哪几个broker
Isr：现在存活的broker

生产消息：

kafka-console-producer.sh --broker-list hadoop000:9093,hadoop000:9094,hadoop000:9095 --topic tp_replicated_topic

消费消息：

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_replicated_topic

在生产消息端发送消息，消费消息端能够接收到。

IDEA环境

pom.xml：

  <properties>
    <scala.version>2.11.8</scala.version>
    <kafka.version>0.9.0.0</kafka.version>
  </properties>

Kafka依赖：

    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka_2.11</artifactId>
      <version>${kafka.version}</version>
    </dependency>

JAVA API - 生产者

这里注意先去服务器的Kafka里把配置设置一下，在service.properties里加上两个参数：

advertised.host.name -> 写本机IP地址（192.168.1.9）

advertised.port -> 写端口号（9092）

不然之后可能会报错：

Exception in thread "Thread-0" kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.

回到主题，代码思路是写Properties->建Producer->新建线程发送信息->run起来->去Consumer查看结果

新建KafkaProperties.java，指定Zookeeper地址、topic的ID、broker的地址：

package com.taipark.spark.kafka;

public class KafkaProperties {
    public static final String ZK = "192.168.1.9:2181";
    public static final String TOPID = "tp_hello_topic";
    public static final String BROKER_LIST = "192.168.1.9:9092";
}

新建kafkaProducer.java：

package com.taipark.spark.kafka;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

import java.util.Properties;

public class KafkaProducer extends Thread{

    private String topic;

    private Producer<Integer,String> producer;

    public KafkaProducer(String topic){
        this.topic = topic;

        Properties properties = new Properties();
        properties.put("metadata.broker.list",KafkaProperties.BROKER_LIST);
        properties.put("serializer.class","kafka.serializer.StringEncoder");
        // request.required.acks = 1 - means the leader will write the message to its local log and immediately acknowledge
        properties.put("request.required.acks","1");
        producer = new Producer<Integer, String>(new ProducerConfig(properties));
    }

    @Override
    public void run() {
        int messageNo = 1;

        while(true){
            String message = "message_" + messageNo;
            producer.send(new KeyedMessage<Integer, String>(topic,message));
            System.out.println("Sent:" + message);
            messageNo++;
            try{
                Thread.sleep(2000);
            }catch (Exception e){
                e.printStackTrace();
            }
        }


    }
}

新建KafkaClientApp.java测试：

package com.taipark.spark.kafka;

public class KafkaClientApp {
    public static void main(String[] args) {
        new KafkaProducer(KafkaProperties.TOPID).start();
    }
}

服务器端把消费者启起来：

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_hello_topic

IDEA里run一下：

OK~

Java API - 消费者

新建KafkaConsumer.java：

package com.taipark.spark.kafka;

import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

public class KafkaConsumer extends Thread{

    private String topic;

    public KafkaConsumer(String topic){
        this.topic =topic;
    }

    private ConsumerConnector createConnector(){

        Properties properties = new Properties();
        properties.put("zookeeper.connect",KafkaProperties.ZK);
        properties.put("group.id",KafkaProperties.GROUP_ID);
        return Consumer.createJavaConsumerConnector(new ConsumerConfig(properties));
    }


    @Override
    public void run() {
        ConsumerConnector consumer = createConnector();
        Map<String,Integer> topicCountMap = new HashMap<String, Integer>();
        topicCountMap.put(topic,1);

        //String:topic
        //List<KafkaStream<byte[], byte[]>>：对应的数据流
        Map<String, List<KafkaStream<byte[], byte[]>>> messageStream = consumer.createMessageStreams(topicCountMap);

        KafkaStream<byte[], byte[]> stream =  messageStream.get(topic).get(0);    //get(0)：获取我们每次接受到的数据
        ConsumerIterator<byte[], byte[]> iterator= stream.iterator();

        while (iterator.hasNext()){
            String message = new String(iterator.next().message());
            System.out.println("rec:" + message);
        }
    }
}

KafkaProperties.java里加上一个GROUP_ID：

    public static final String GROUP_ID = "test_group_1";

去run一下：

package com.taipark.spark.kafka;

public class KafkaClientApp {
    public static void main(String[] args) {
        new KafkaProducer(KafkaProperties.TOPID).start();

        new KafkaConsumer(KafkaProperties.TOPID).start();

    }
}

Done~