Flink Physical partitioning(物理分区)
Rebalancing (Round-robin partitioning) 默认策略
轮询,会将数据轮询发送给下游任务
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.rebalance
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
Random partitioning
随机将数据发送给下游
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.shuffle
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.printToErr("测试")
fsEnv.execute("FlinkWordCounts")
Rescaling
上游分区的数据 会 轮询方式发送给下游的子分区,上下游任务并行度呈现整数倍

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.setParallelism(4)
.rescale
.map(word=>(word,1))
.setParallelism(2)
.print("测试")
.setParallelism(2)
fsEnv.execute("FlinkWordCounts")
Broadcasting
将上游数据广播给下游所有分区。
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.broadcast
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.print("测试")
fsEnv.execute("FlinkWordCounts")
Custom partitioning
自定义分区
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.partitionCustom(new Partitioner[String] {
override def partition(key: String, numPartitions: Int): Int = {
//保证是正整数 key.hashCode&Integer.MAX_VALUE
(key.hashCode&Integer.MAX_VALUE)%numPartitions
}
},t=>t._1)
.print("测试")
fsEnv.execute("Custom Partitions")
本文详细介绍了 Apache Flink 中的数据流处理分区策略,包括默认的 Rebalancing(轮询分区)、Random partitioning(随机分区)、Rescaling(重新调整分区)、Broadcasting(广播分区)以及 Custom partitioning(自定义分区)。每种策略都通过具体的代码示例进行了说明,有助于理解不同场景下分区策略的选择和应用。
1591

被折叠的 条评论
为什么被折叠?



