Kafka

本文介绍了Kafka这一高效的消息系统,对比传统企业级消息系统,Kafka具备分布式、高吞吐量等特点,适用于大规模的日志处理场景。文章还详细解释了Kafka的主题、分区、生产者、消费者等核心概念,并探讨了其在数据持久化、传输效率及状态管理方面的实现机制。

About Kafka

Overview

Kafka is a distributed messaging system for collecting and delivering high volumes of log data with low latency. On the one hand, Kafka is distributed and scalable, and offers high throughput. On the other hand, Kafka provides an API similar to a messaging system and allows applications to consume log events in real time

Traditional Messaging System

Traditional enterprise messaging systems are not to be a good fit for log processing. The reason is as follows:
1. There is a mismatch in features offered by enterprise systems like out of order and unneeded features
2. Many systems do not focus as strongly on throughput as their primary design constraint like no batch operations
3. Weak in distributed support
4. Many messaging systems assume near immediate consumption of messages, so the queue of unconsumed messages is always fairly small.

Kafka

A stream of messages of a particular type is defined by a topic. To balance load, a topic is divided into multiple partitions and each broker stores one or more of those partitions. A producer can publish messages to a topic. The published messages are then stored at a set of servers called brokers. A consumer can subscribe to one or more topics from the brokers, and consume the subscribed messages by pulling data from the brokers
Kafka Architecture

1. Efficiency on Single Partition
  • Simple Storage:
    Every time a producer publishes a message to a partition, the broker simply appends the message to the last segment file. The segment files are written to disk only after a configurable number of messages have been published or a certain amount of time has elapsed. A message is only exposed to the consumers after it is flushed and a message stored in Kafka does not have an explicit message id. Instead, each message is addressed by its logical offset in the log.
    Kafka Log
  • Efficient Transfer:
    Avoid explicitly caching messages in memory at the Kafka layer. Instead, cache them on the underlying file system page cache, which avoids double buffering. Since Kafka does not cache messages in process at all, it has very little overhead in garbage collecting its memory, making efficient implementation in a VM-based language feasible. Plus, Kafka is a multi-subscriber system and a single message may be consumed multiple times by different consumer applications
  • Stateless Broker:
    The volume of info consumed is stored in consumer, which reduce complexity and the overhead on the broker. A message is automatically deleted if it has been retained in the broker longer than a certain period, typically 7 days; thus, A consumer can deliberately rewind back to an old offset and re-consume data.
2. Distributed Coordination
  • A partition within a topic is the smallest unit of parallelism. Kafka has the concept of consumer groups. Each consumer group consists of one or more consumers that jointly consume a set of subscribed topics, and each message is delivered to only one of the consumers within the group.
  • Not having a central “master” node, but instead let consumers coordinate among themselves in a decentralized fashion. To facilitate the coordination, we employ a highly available consensus service Zookeeper. Zookeeper has a very simple, file system like API and it has the following functions:
    Zookeeper can create a path, set the value of a path, read the value of a path, delete a path, and list the children of a path. It does a few more interesting things: (a) one can register a watcher on a path and get notified when the children of a path or the value of a path has changed; (b) a path can be created as ephemeral (as oppose to persistent), which means that if the creating client is gone, the path is automatically removed by the Zookeeper server; (c) zookeeper replicates its data to multiple servers, which makes the data highly reliable and available. Kafka uses Zookeeper for the following tasks: (1) detecting the addition and the removal of brokers and consumers, (2) triggering a rebalance process in each consumer when the above events happen, and (3) maintaining the consumption relationship and keeping track of the consumed offset of each partition
3. Delivery Guarantees
  • In general, Kafka only guarantees at-least-once delivery
  • Kafka guarantees that messages from a single partition are delivered to a consumer in order

Install Kafka

Download and extract Kafka from http://kafka.apache.org/downloads.html. For me, I have unzipped it in C:\Kafka

Set Up Kafka

  1. Go to your Kafka config directory. For me its C:\Kafka\kafka_2.11-0.10.1.0\config
  2. Edit file “server.properties”
  3. Find & edit line “log.dirs=/tmp/kafka-logs” to “log.dir= C:\Kafka\kafka_2.11-0.10.1.0\kafka-logs”.(Actually it doesn’t matter where you put your log)
  4. If your Zookeeper is running on some other machine or cluster you can edit “zookeeper.connect:2181” to your custom IP and port. For this demo we are using same machine so no need to change. Also Kafka port & broker.id are configurable in this file. Leave other settings as it is.
  5. Your Kafka will run on default port 9092 & connect to zookeeper’s default port which is 2181.

Run Kafka Server

  1. Go to your Kafka installation directory C:\Kafka\kafka_2.11-0.10.1.0\
  2. Open a command prompt here by pressing Shift + right click and choose “Open command window here” option)
  3. Now type .\bin\windows\kafka-server-start.bat .\config\server.properties and press Enter.(Before you run Kafka server, make sure you have opened ZooKeeper)
    Kafka
11-11
### Kafka入门教程及使用场景 #### 一、Kafka简介 Apache Kafka 是一种分布式流处理平台,能够实现高吞吐量的消息传递系统。它最初由 LinkedIn 开发并开源,现已成为 Apache 软件基金会的一部分[^1]。 #### 二、Kafka的安装与配置 以下是基于 Docker 的 Kafka 安装方法: ```yaml version: "1" services: kafka: image: 'bitnami/kafka:latest' hostname: kafka ports: - 9092:9092 - 9093:9093 volumes: - 'D:\Docker\Kafka\data:/bitnami/kafka' networks: - kafka_net environment: # KRaft settings - KAFKA_CFG_NODE_ID=0 - KAFKA_CFG_PROCESS_ROLES=controller,broker - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093 # Listeners - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093 - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://192.168.2.51:9092 - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=PLAINTEXT networks: kafka_net: driver: bridge ``` 运行命令如下: ```bash docker-compose -f .\docker-compose.yml up -d ``` 上述 YAML 文件定义了一个简单的 Kafka 集群环境,并通过 `docker-compose` 启动服务[^1]。 #### 三、Kafka的基础概念 在 Kafka 中,消息被存储在主题(Topic)中,而每个 Topic 又分为若干分区(Partition)。每个分区有一个 Leader 和零个或多个 Follower。Leader 负责读写操作,Follower 则同步数据以提供冗余支持。当创建一个新的 Topic 时,Kafka 自动将 Partition 的 Leader 均匀分布到各个 Broker 上,从而提高系统的可靠性和性能[^2]。 #### 四、可视化管理工具 Offset Explorer 是一款常用的 Kafka 数据管理和监控工具,可以帮助开发者更直观地查看和分析 Kafka 主题中的偏移量和其他元数据信息[^1]。 #### 五、Kafka的主要使用场景 1. **日志收集**:Kafka 可用于集中式日志采集方案,实时捕获来自不同服务器的日志文件。 2. **消息队列**:作为传统 MQ 替代品,Kafka 提供高性能异步通信机制。 3. **活动跟踪**:记录用户的在线行为轨迹,便于后续数据分析挖掘价值。 4. **指标监测**:构建企业级运营状态仪表盘,展示关键业务指标变化趋势。 5. **ETL流程优化**:连接多种数据库之间复杂的数据转换过程,提升效率减少延迟。 #### 六、总结 通过对 Kafka 的基本原理理解及其实际应用场景探讨,可以更好地掌握如何利用这一强大技术解决现实世界中的挑战性问题。 问题
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值