Apache Kafka是一个分布式流式平台.
1. 可以发布/订阅 (publish/subscribe) records流.
2. 可以以fault-tolerance方式存储records流.
3. 可以实时处理records流. (streams of records)
实用场景:
1. 在不同系统或应用间搭建实时流数据管道.
2. 搭建实时的流式应用来传输或对数据流进行响应.
kafka以集群的形式运行.
kafka集群存储数据流的单元叫做topic
每条record包含: key, value, 和 timestamp
Kafka有4个核心API:
1. Producer API allows an application to publish a stream of records to one or more Kafka topics.
2. Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
3. Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
4. Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.