Kafka分布式消息发布和订阅系统简介_kafka消息发布和订阅模型英文-优快云博客

Kafka是一个分布式消息发布和订阅系统，其中生产者将数据推送至broker存储，消费者通过拉取方式从broker获取并处理数据。系统依赖Zookeeper进行协调和请求管理，实现了高性能的分布式操作。运行流程包括启动Zookeeper，接着启动Kafka服务器，生产者通过Zookeeper找到broker推送数据，消费者则通过Zookeeper定位broker消费数据。

在kafka官网上对kafka的定义叫：A distributed publish-subscribe messaging system。publish-subscribe是发布和订阅的意思，所以更准确的说kafka是一个消息订阅和发布的系统。publish-subscribe这个概念很重要，因为kafka的设计理念就可以从这里说起。

Kafka有哪些吸引程序员去使用的特点：

在Apache网站给出以下介绍

1. Fast（高吞吐量的，一个Kafka的broker每秒能读写数百M的数据）

A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.

2.Scalable（可扩展的，不须停机就可扩展）

Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers

3.Durable（持久化的）

Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.

4.Distributed by Design（分布式的）

Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.

我们将消息的发布（publish）暂时称作producer，将消息的订阅（subscribe）表述为consumer，将中间的存储阵列称作broker，这样我们就可以大致描绘出这样一个场面：

生产者将数据生产出来，丢给broker进行存储，消费者需要消费数据了，就从broker中去拿出数据来，然后完成一系列对数据的处理。

乍一看这也太简单了，不是说了它是分布式么，难道把producer、broker和consumer放在三台不同的机器上就算是分布式了么。我们看kafka官方给出的图：

多个broker协同合作，producer和consumer部署在各个业务逻辑中被频繁的调用，三者通过zookeeper管理协调请求和转发。这样一个高性能的分布式消息发布与订阅系统就完成了。图上有个细节需要注意，producer到broker的过程是push，也就是有数据就推送到broker，而consumer到broker的过程是pull，是通过consumer主动去拉数据的，而不是broker把数据主动发送到consumer端的。

整个系统运行的顺序：

1. 启动zookeeper的server

2. 启动kafka的server（broker），注册在zookeeper上

3. Producer如果生产了数据，会先通过zookeeper找到broker，然后将数据存放进broker

4. Consumer如果要消费数据，会先通过zookeeper找对应的broker，然后消费。