For kafka:
topics are partitioned into partitions by key;
partitions are on brokers; each broker can hold partitions from different topics
each consumers group hold different consumers, each consumer receives data from multiple partitions;
each producer/topic writes to multiple partitions.

For yarn:
NodeManager, which is responsible for launching processes on that machine
ResourceManager talks to all of the NodeManagers to tell them what to run
ApplicationMaster, is actually application-specific code that runs in the YARN cluster
Samza supports 2 kinds of processing:
stateless processing: does not retain any state associated with the current message after it has been processed
stateful processing: requires you to record some state about a message even after processing it
Samza supports two notions of time: processing time and embedded source time
Samza guarantee each record is processed at least once
Samza's cordinator supports both embedded library model(kafka) and framework model(flink).
Samza supports both in-order and out-of-order processing.
Each thread runs one or more tasks

reference:http://samza.apache.org/learn/documentation/latest/core-concepts/core-concepts.html
博客介绍了Kafka、Yarn和Samza的相关知识。Kafka中主题按键分区,分区位于代理上,消费者组和生产者有不同的数据处理方式;Yarn包含NodeManager、ResourceManager和ApplicationMaster;Samza支持无状态和有状态处理,有两种时间概念,保证记录至少处理一次,支持多种处理模式。
173

被折叠的 条评论
为什么被折叠?



