Kafka系列之Stream核心原理(一)

KafkaStreams是一个用于处理和分析Kafka数据的客户端库,提供流处理概念,如事件时间和处理时间、窗口支持以及容错本地状态。它支持从概念验证到大规模生产的无缝扩展,并保证exactly-once处理语义。文章还介绍了流处理拓扑、时间概念、流和表的二元性,以及聚合和窗口操作,强调了KafkaStreams在处理无序数据和状态管理上的特性。

Kafka Streams is a client library for processing and analyzing data stored in Kafka. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management and real-time querying of application state.

Kafka Streams 是一个客户端库,用于处理和分析存储在 Kafka 中的数据。它建立在重要的流处理概念之上,例如正确区分事件时间和处理时间、窗口支持以及简单而高效的管理和应用程序状态的实时查询

Kafka Streams has a low barrier to entry: You can quickly write and run a small-scale proof-of-concept on a single machine; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads. Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka’s parallelism model

Kafka Streams入门门槛低:您可以在单机上快速编写和运行小规模的概念验证;并且您只需要在多台机器上运行应用程序的额外实例即可扩展到大批量生产工作负载。Kafka Streams 通过利用 Kafka 的并行模型透明地处理同一应用程序的多个实例的负载平衡

Some highlights of Kafka Streams:
Kafka Streams 的一些亮点:

  • Designed as a simple and lightweight client library, which can be easily embedded in any Java application and integrated with any existing packaging, deployment and operational tools that users have for their streaming applications.
    设计为简单轻量级的客户端库,可以轻松嵌入到任何 Java 应用程序中,并与用户为其流式应用程序拥有的任何现有打包、部署和操作工具集成

  • Has no external dependencies on systems other than Apache Kafka itself as the internal messaging layer; notably, it uses Kafka’s partitioning model to horizontally scale processing while maintaining strong ordering guarantees.
    除了作为内部消息传递层的Apache Kafka 本身之外,对系统没有外部依赖;值得注意的是,它使用 Kafka 的分区模型来水平扩展处理,同时保持强大的排序保证

  • Supports fault-tolerant local state, which enables very fast and efficient stateful operations like windowed joins and aggregations.
    支持容错本地状态,这可以实现非常快速和高效的状态操作,如窗口连接和聚合

  • Supports exactly-once processing semantics to guarantee that each record will be processed once and only once even when there is a failure on either Streams clients or Kafka brokers in the middle of processing.
    支持exactly-once处理语义以保证每条记录将被处理一次且仅一次,即使在处理过程中 Streams 客户端或 Kafka 代理出现故障也是如此

  • Employs one-record-at-a-time processing to achieve millisecond processing latency, and supports event-time based windowing operations with out-of-order arrival of records.
    采用一次一条记录处理以实现毫秒级处理延迟,并支持基于事件时间的窗口操作,记录无序到达

  • Offers necessary stream processing primitives, along with a high-level Streams DSL and a low-level Processor API
    提供必要的流处理原语,以及高级 Streams DSL和低级 Processor API

We first summarize the key concepts of Kafka Streams
我们首先总结一下 Kafka Streams 的关键概念。

Stream Processing Topology流处理拓扑
  • A stream is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set. A stream is an ordered, replayable, and fault-tolerant sequence of immutable data records, where a data record is defined as a key-value pair.
    流是 Kafka Streams 提供的最重要的抽象:它代表一个无限的、不断更新的数据集。流是不可变数据记录的有序、可重放和容错序列,其中数据记录定义为键值对

  • A stream processing application is any program that makes use of the Kafka Streams library. It defines its computational logic through one or more processor topologies, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).
    流处理应用程序是任何使用 Kafka Streams 库的程序。它通过一个或多个处理器拓扑定义其计算逻辑,其中处理器拓扑是由流(边)连接的流处理器(节点)的图形

  • A stream processor is a node in the processor topology; it represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it, and may subsequently produce one or more output records to its downstream processors
    流处理器是处理器拓扑中的一个节点;它表示一个处理步骤,通过一次从拓扑中的上游处理器接收一个输入记录,将其操作应用于它,并可能随后向其下游处理器产生一个或多个输出记录,从而转换流中的数据

There are two special processors in the topology:
拓扑中有两个特殊的处理器:

  • Source Processor: A source processor is a special type of stream processor that does not have any upstream processors. It produces an input stream to its topology from one or multiple Kafka topics by consuming records from these topics and forwarding them to its down-stream processors.
    源处理器:源处理器是一种特殊类型的流处理器,它没有任何上游处理器。它通过使用来自这些主题的记录并将它们转发到其下游处理器,从一个或多个 Kafka 主题生成到其拓扑的输入流。

  • Sink Processor: A sink processor is a special type of stream processor that does not have down-stream processors. It sends any received records from its up-stream processors to a specified Kafka topic
    接收器处理器:接收器处理器是一种特殊类型的流处理器,没有下游处理器。它将从其上游处理器接收到的任何记录发送到指定的 Kafka 主题

Note that in normal processor nodes other remote systems can also be accessed while processing the current record. Therefore the processed results can either be streamed back into Kafka or written to an external system
请注意,在正常的处理器节点中,在处理当前记录时也可以访问其他远程系统。因此,处理后的结果可以流回 Kafka 或写入外部系统

Kafka Streams offers two ways to define the stream processing topology: the Kafka Streams DSL provides the most common data transformation operations such as map, filter, join and aggregations out of the box; the lower-level Processor API allows developers define and connect custom processors as well as to interact with state stores.
Kafka Streams 提供了两种定义流处理拓扑的方式:Kafka Streams DSL提供了最常见的数据转换操作,例如map filter join aggregations,开箱即用;较低级别的处理器 API允许开发人员定义和连接自定义处理器以及与状态存储交互。

A processor topology is merely a logical abstraction for your stream processing code. At runtime, the logical topology is instantiated and replicated inside the application for parallel processing
处理器拓扑仅仅是流处理代码的逻辑抽象。在运行时,逻辑拓扑在应用程序内部被实例化和复制以进行并行处理

Time时间

A critical aspect in stream processing is the notion of time, and how it is modeled and integrated. For ex

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

算法小生Đ

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值