Storm模型:
A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators.
storm的应用建模为一个拓扑结构,DAG有向无环图,其中图中的节点是运算符,边代表着运算符之间的数据流
A Storm cluster can run topologies (Storm’s jargon for an application) made up of several processing components. Components of a topology can be either spouts, that act as event producers, or bolts that implement the processing logic.
Events emitted by a spout constitute a stream that can be transformed by passing through one or multiple bolts where its events are processed.
Therefore, a topology represents a graph of stream transformations. When a topology is submitted to Storm it schedules its execution in the cluster, i.e., it assigns the execution of each spout and each bolt to one of the nodes forming the cluster.
一个拓扑代表传输数据流的图。当拓扑提交给storm后,他会在集群中调度执行。
即:把每个spout、bolt组件的executors分配给集群中的worker node物理节点
A computation in Storm is represented by a topology, that is a graph where nodes are operators that encapsulate processing logic and edges model data flows among operators. In the Storm’s jargon, such a node is called a component. The unit of information that is exchanged among components is referred to as a tuple, that is a named list of values. There are two types of components:
在组件之间交换的信息单位称为一个元组,即一个命名的值列表。
(i) spouts, that model event sources and usually wrap the actual generators of input events so as to provide a common mechanism to feed data into a topology, and
(ii) bolts, that encapsulate the specific processing logic such as filtering, transforming and correlating tuples.
封装特定的处理逻辑,如过滤、转换和关联元组。
The software component of nimbus in charge of deciding how to deploy a topology is called scheduler. On the basis of the topology configuration, the scheduler has to perform the deployment in two consecutive phases: (1) assign executors to workers, (2) assign workers to slots.
nimbus中的调度模块:负责如何部署拓扑组件。在拓扑配置的基础上,调度器分为两个阶段来进行:1.分配executor到workers 2.分配workers到slots
本文中关注点:
Requiring two distinct levels, one for tasks and one for executors, is dictated by a requirement on dynamic

本文深入探讨了Apache Storm的调度优化,包括离线和在线两种基于拓扑的调度策略。文章详细介绍了调度器的设计基础,算法实现,并提出了衡量调度效果的负载均衡目标。此外,还分享了相关实验结果,为理解Storm的调度机制提供了深入见解。
最低0.47元/天 解锁文章





