Kafka入门指南

概述

 Kafka是一种消息中间件。 

Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

Kafka®用于建立实时数据管道和流媒体应用程序。
它具有水平扩展能力、容错能力、运行速度快,并且可以在数千家公司的产品中运行。

流媒体平台有三个关键功能:

  • 发布和订阅记录流,类似于消息队列或企业消息传递系统。
  • 以容错的持久方式存储记录流。
  • 当记录发生时处理它们的流。

Kafka通常用于两大类应用程序:

  • 构建实时流数据管道,在系统或应用程序之间可靠地获取数据
  • 构建转换或响应数据流的实时流应用程序

概念 

  • producer:生产者
  • consumer:消费者
  • broker:中间件
  • topic:主题,指向不同消费者的标签

架构 

  • Producer API允许应用程序将记录流发布到一个或多个Kafka主题。
  • Consumer API允许应用程序订阅一个或多个主题,并处理产生给它们的记录流。
  • Streams API允许应用程序充当流处理器,使用一个或多个主题的输入流,并生成一个或多个输出主题的输出流,从而有效地将输入流转换为输出流。
  • Connector API允许构建和运行将Kafka主题连接到现有应用程序或数据系统的可重用生产者或消费者。例如,到关系数据库的连接器可能捕获对表的每个更改。

配置 

Kafka环境前提为安装对应的Zookeeper

Kafka配置关键在于$KAFKA_PATH/conf/server.properties:

  • broker.id -> 中间件编号(唯一)
  • listeners -> 监听端口号
  • host.name -> 当前机器名
  • log.dirs ->存放Kafka日志的文件
  • zookeeper.connect -> Zookeeper地址
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

############################# Socket Server Settings #############################

listeners=PLAINTEXT://:9092

# The port the socket server listens on
#port=9092

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
host.name=hadoop000

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured.  Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>

# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop000:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

启动

启动Zookeeper的server:

./zkServer.sh start

启动Kafka:

kafka-server-start.sh $KAFKA_HOME/config/server.properties

单节点单broker部署

创建一个topic(副本系数1,分区数1):

kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 1 --partitions 1 -topic tp_hello_topic

查看所有topic:

kafka-topics.sh --list --zookeeper hadoop000:2181

生产消息:

kafka-console-producer.sh --broker-list hadoop000:9092 --topic tp_hello_topic

消费消息:

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_hello_topic --from-beginning 

在生产消息端发送消息,消费消息端能够接收到。

创建、查看、消费时,是Zookeeper的地址,生产时,是broker的地址。

单节点多broker部署

设置配置文件server-1.properties、server-2.properties、server-3.properties,更改内容为:

  • broker.id ->1/2/3
  • listeners ->9093/9094/9095端口
  • log.dirs ->kafka-logs-1/kafka-logs-2/kafka-logs-3

启动三个broker:

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties & 
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties & 
kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties &

创建topic:

kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 3 --partitions 1 -topic tp_replicated_topic

查看详细信息:

kafka-topics.sh --describe --zookeeper hadoop000:2181 --topic tp_replicated_topic

Topic:tp_replicated_topic	PartitionCount:1	ReplicationFactor:3	Configs:
	Topic: tp_replicated_topic	Partition: 0	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
  • Leader:哪个是领导broker
  • Replicas:副本在哪几个broker
  • Isr:现在存活的broker

生产消息:

kafka-console-producer.sh --broker-list hadoop000:9093,hadoop000:9094,hadoop000:9095 --topic tp_replicated_topic

消费消息:

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_replicated_topic

在生产消息端发送消息,消费消息端能够接收到。

IDEA环境

pom.xml:

  <properties>
    <scala.version>2.11.8</scala.version>
    <kafka.version>0.9.0.0</kafka.version>
  </properties>

Kafka依赖:

    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka_2.11</artifactId>
      <version>${kafka.version}</version>
    </dependency>

JAVA API - 生产者

这里注意先去服务器的Kafka里把配置设置一下,在service.properties里加上两个参数:

advertised.host.name -> 写本机IP地址(192.168.1.9)

advertised.port -> 写端口号(9092)

不然之后可能会报错:

Exception in thread "Thread-0" kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.

回到主题,代码思路是写Properties->建Producer->新建线程发送信息->run起来->去Consumer查看结果

新建KafkaProperties.java,指定Zookeeper地址、topic的ID、broker的地址:

package com.taipark.spark.kafka;

public class KafkaProperties {
    public static final String ZK = "192.168.1.9:2181";
    public static final String TOPID = "tp_hello_topic";
    public static final String BROKER_LIST = "192.168.1.9:9092";
}

新建kafkaProducer.java:

package com.taipark.spark.kafka;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

import java.util.Properties;

public class KafkaProducer extends Thread{

    private String topic;

    private Producer<Integer,String> producer;

    public KafkaProducer(String topic){
        this.topic = topic;

        Properties properties = new Properties();
        properties.put("metadata.broker.list",KafkaProperties.BROKER_LIST);
        properties.put("serializer.class","kafka.serializer.StringEncoder");
        // request.required.acks = 1 - means the leader will write the message to its local log and immediately acknowledge
        properties.put("request.required.acks","1");
        producer = new Producer<Integer, String>(new ProducerConfig(properties));
    }

    @Override
    public void run() {
        int messageNo = 1;

        while(true){
            String message = "message_" + messageNo;
            producer.send(new KeyedMessage<Integer, String>(topic,message));
            System.out.println("Sent:" + message);
            messageNo++;
            try{
                Thread.sleep(2000);
            }catch (Exception e){
                e.printStackTrace();
            }
        }


    }
}

新建KafkaClientApp.java测试:

package com.taipark.spark.kafka;

public class KafkaClientApp {
    public static void main(String[] args) {
        new KafkaProducer(KafkaProperties.TOPID).start();
    }
}

服务器端把消费者启起来:

kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_hello_topic

IDEA里run一下:

OK~

Java API - 消费者

新建KafkaConsumer.java:

package com.taipark.spark.kafka;

import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

public class KafkaConsumer extends Thread{

    private String topic;

    public KafkaConsumer(String topic){
        this.topic =topic;
    }

    private ConsumerConnector createConnector(){

        Properties properties = new Properties();
        properties.put("zookeeper.connect",KafkaProperties.ZK);
        properties.put("group.id",KafkaProperties.GROUP_ID);
        return Consumer.createJavaConsumerConnector(new ConsumerConfig(properties));
    }


    @Override
    public void run() {
        ConsumerConnector consumer = createConnector();
        Map<String,Integer> topicCountMap = new HashMap<String, Integer>();
        topicCountMap.put(topic,1);

        //String:topic
        //List<KafkaStream<byte[], byte[]>>:对应的数据流
        Map<String, List<KafkaStream<byte[], byte[]>>> messageStream = consumer.createMessageStreams(topicCountMap);

        KafkaStream<byte[], byte[]> stream =  messageStream.get(topic).get(0);    //get(0):获取我们每次接受到的数据
        ConsumerIterator<byte[], byte[]> iterator= stream.iterator();

        while (iterator.hasNext()){
            String message = new String(iterator.next().message());
            System.out.println("rec:" + message);
        }
    }
}
KafkaProperties.java里加上一个GROUP_ID:
    public static final String GROUP_ID = "test_group_1";

去run一下:

package com.taipark.spark.kafka;

public class KafkaClientApp {
    public static void main(String[] args) {
        new KafkaProducer(KafkaProperties.TOPID).start();

        new KafkaConsumer(KafkaProperties.TOPID).start();

    }
}

Done~

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值