文章目录
数据数据交换策略
数据交换策略定义了在物理执行流图中如何将数据分配给任务。数据交换策略可以由执行引擎自动选择,具体取决于算子的语义或我们明确指定的语义。下图展示的是一些常见的数据交换策略。
- Forward 前向策略将数据从一个任务发送到接收任务。如果两个任务都位于同一台物理计算机上(这通常由任务调度器确保),这种交换策略可以避免网络通信。
- Broadcast 广播策略将所有数据发送到算子的所有的并行任务上面去。因为这种策略会复制数据和涉及网络通信,所以代价相当昂贵。
- Key-base 基于键控的策略通过 Key 值(键)对数据进行分区保证具有相同 Key 的数据将由同一任务处理。
- Random 随机策略统一将数据分配到算子的任务中去,以便均匀地将负载分配到不同的计算任务。
Flink 的分布式转换算子
分区操作对应于的是“数据交换策略”。这些操作定义了事件如何分配到不同的任务中去。当我们使用 DataStream API 来编写程序时,系统将自动的选择数据分区策略,然后根据操作符的语义和设置的并行度将数据路由到正确的地方去。有些时候,我们需要在应用程序的层面控制分区策略,或者自定义分区策略。例如,如果我们知道会发生数据倾斜,那么我们想要针对数据流做负载均衡,将数据流平均发送到接下来的操作符中去。又或者,应用程序的业务逻辑可能需要一个算子所有的并行任务都需要接收同样的数据。再或者,我们需要自定义分区策略的时候。
keyBy() 方法不同于分布式转换算子。所有的分布式转换算子将产生 DataStream 数据类型。而keyBy() 产生的类型是 KeyedStream,它拥有自己的 keyed state。
Random
随机数据交换由 DataStream.shuffle() 方法实现。shuffle 方法将数据随机的分配到下游算子的并行任务中去。
Round-Robin
rebalance() 使用 Round-Robin 负载均衡算法将输入流平均分配到随后的并行运行的任务中去。
Rescale
rescale() 方法使用的也是 round-robin 算法,但只会将数据发送到接下来的并行运行的任务中的一部分任务中。本质上,当发送者任务数量和接收者任务数量不一样时,rescale 分区策略提供了一种轻量级的负载均衡策略。如果接收者任务的数量是发送者任务的数量的倍数时,rescale 操作将会效率更高。
rebalance() 和 rescale() 的根本区别在于任务之间连接的机制不同。 rebalance() 将会针对所有发送者任务和所有接收者任务之间建立通信通道,而 rescale() 仅仅针对每一个任务和下游算子的一部分子并行任务之间建立通信通道。
Broadcast
broadcast() 方法将输入流的所有数据复制并发送到下游算子的所有并行任务中去。
Global
global() 方法将所有的输入流数据都发送到下游算子的第一个并行任务中去。这个操作需要很谨慎,因为将所有数据发送到同一个task,将会对应用程序造成很大的压力。
Custom
当 Flink 提供的分区策略都不适用时,我们可以使用 partitionCustom() 方法来自定义分区策略。这个方法接收一个 Partitioner 对象,这个对象需要实现分区逻辑以及定义针对流的哪一个字段或者key来进行分区。
public class RepartitionOperator {
// shuffle 方法将数据随机的分配到下游算子的并行任务中去
// rebalance 方法使用 Round-Robin 负载均衡算法将输入流平均分配到随后的并行运行的任务中去
// rescale 方法使用的也是 round-robin 算法,但只会将数据发送到接下来的并行运行的任务中的一部分任务中
// broadcast 方法将输入流的所有数据复制并发送到下游算子的所有并行任务中去
// global 方法将所有的输入流数据都发送到下游算子的第一个并行任务中去
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// shuffle 方法将数据随机的分配到下游算子的并行任务中去
// private Random random = new Random();
// public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {return random.nextInt(numberOfChannels); }
env
.fromElements(1, 2, 3, 4).setParallelism(1)
.shuffle()
.print("shuffle:").setParallelism(2);
// rebalance 方法使用 Round-Robin 负载均衡算法将输入流平均分配到随后的并行运行的任务中去
// nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
env
.fromElements(1, 2, 3, 4).setParallelism(1)
.rebalance()
.print("rebalance:").setParallelism(2);
// rescale 方法使用的也是 round-robin 算法,但只会将数据发送到接下来的并行运行的任务中的一部分任务中
env
.fromElements(1, 2, 3, 4).setParallelism(1)
.rescale()
.print("rescale:").setParallelism(2);
// broadcast 方法将输入流的所有数据复制并发送到下游算子的所有并行任务中去
env
.fromElements(1, 2, 3, 4).setParallelism(1)
.broadcast()
.print("broadcast:").setParallelism(2);
// global 方法将所有的输入流数据都发送到下游算子的第一个并行任务中去
env.execute();
}
}
转换算子源码解析
Flink version 1.16.0
上一节提到的分布式转换算子 Random、Round-Robin、Rescale 等的实现都定义在 SubtaskStateMapper 枚举类中,然后通过 StreamPartitioner 抽象类的具体实现在 getUpstreamSubtaskStateMapper 和 getDownstreamSubtaskStateMapper 方法中定义上下游的关系。
StreamPartitioner
/** A special {@link ChannelSelector} for use in streaming programs. */
@Internal
public abstract class StreamPartitioner<T>
implements ChannelSelector<SerializationDelegate<StreamRecord<T>>>, Serializable {
private static final long serialVersionUID = 1L;
// 持有 output channel 数量
protected int numberOfChannels;
@Override
public void setup(int numberOfChannels) {
this.numberOfChannels = numberOfChannels;
}
// 是否采用广播的形式
@Override
public boolean isBroadcast() {
return false;
}
// 拷贝方法
public abstract StreamPartitioner<T> copy();
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
final StreamPartitioner<?> that = (StreamPartitioner<?>) o;
return numberOfChannels == that.numberOfChannels;
}
@Override
public int hashCode() {
return Objects.hash(numberOfChannels);
}
/**
* 决定了作业恢复时候上游遇到扩缩容的话,需要处理哪些上游状态保存的数据
* Defines the behavior of this partitioner, when upstream rescaled during recovery of in-flight
* data.
*/
public SubtaskStateMapper getUpstreamSubtaskStateMapper() {
return SubtaskStateMapper.ARBITRARY;
}
/**
* 决定了作业恢复时候下游遇到扩缩容的话,需要处理哪些下游状态保存的数据
* Defines the behavior of this partitioner, when downstream rescaled during recovery of
* in-flight data.
*/
public abstract SubtaskStateMapper getDownstreamSubtaskStateMapper();
// isPointwise 方法决定了上游和下游的对应关系
// false 表示没有指向性,上游和下游没有明确的对应关系
// true 表示上游和下游存在对应关系
public abstract boolean isPointwise();
}
isPointwise 主要用来标记该分类器是否是"点对点"的分配模式,Flink 数据的分配模式分为2类:
/**
* A distribution pattern determines, which sub tasks of a producing task are connected to which
* consuming sub tasks.
*
* <p>It affects how {@link ExecutionVertex} and {@link IntermediateResultPartition} are connected
* in {@link EdgeManagerBuildUtil}
*/
public enum DistributionPattern {
/** 上游的每个 subtask 需要和下游的每个 subtask 连接
* Each producing sub task is connected to each sub task of the consuming task. */
ALL_TO_ALL,
/** 上游的每个 subtask 和下游的1个或多个 subtask 连接
* Each producing sub task is connected to one or more subtask(s) of the consuming task. */
POINTWISE
}
StreamPartitioner 抽象类继承了 ChannelSelector 接口,ChannelSelector 接口的关键方法为selectChannel,继承 StreamPartitioner 的抽象类,必须自定义实现 selectChannel 方法来控制元素的分流行为。
/**
* The {@link ChannelSelector} determines to which logical channels a record should be written to.
*
* @param <T> the type of record which is sent through the attached output gate
*/
public interface ChannelSelector<T extends IOReadableWritable> {
/**
* 输出 channel 的数量
* Initializes the channel selector with the number of output channels.
*
* @param numberOfChannels the total number of output channels which are attached to respective
* output gate.
*/
void setup(int numberOfChannels);
/**
* 返回选择的 channel 索引编号,这个方法决定的上游的数据需要写入到哪个 channel 中
* 对于 broadcast 广播类型算子,不需要实现该方法
* 传入的参数为记录数据流中的元素,该方法需要根据元素来推断出需要发送到的下游 channel
* Returns the logical channel index, to which the given record should be written. It is illegal
* to call this method for broadcast channel selectors and this method can remain not
* implemented in that case (for example by throwing {@link UnsupportedOperationException}).
*
* @param record the record to determine the output channels for.
* @return an integer number which indicates the index of the output channel through which the
* record shall be forwarded.
*/
int selectChannel(T record);
/**
* 返回是否为广播类型,将数据发送到下游所有 channel
*
* Returns whether the channel selector always selects all the output channels.
*
* @return true if the selector is for broadcast mode.
*/
boolean isBroadcast();
}
SubtaskStateMapper
SubtaskStateMapper 提供了 ARBITRARY、FIRST、FULL、RANGE、ROUND_ROBIN、UNSUPPORTED 6类实现。
/**
* The {@code SubtaskStateMapper} narrows down the subtasks that need to be read during rescaling to
* recover from a particular subtask when in-flight data has been stored in the checkpoint.
*
* <p>Mappings of old subtasks to new subtasks may be unique or non-unique. A unique assignment
* means that a particular old subtask is only assigned to exactly one new subtask. Non-unique
* assignments require filtering downstream. That means that the receiver side has to cross-verify
* for a deserialized record if it truly belongs to the new subtask or not. Most {@code
* SubtaskStateMapper} will only produce unique assignments and are thus optimal. Some rescaler,
* such as {@link #RANGE}, create a mixture of unique and non-unique mappings, where downstream
* tasks need to filter on some mapped subtasks.
*/
@Internal
public enum SubtaskStateMapper {
ARBITRARY
本质上调用的是 ROUND_ROBIN 的方法实现
/**
* Extra state is redistributed to other subtasks without any specific guarantee (only that up-
* and downstream are matched).
*/
ARBITRARY {
@Override
public int[] getOldSubtasks(
int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks) {
// The current implementation uses round robin but that may be changed later.
return ROUND_ROBIN.getOldSubtasks(
newSubtaskIndex, oldNumberOfSubtasks, newNumberOfSubtasks);
}
},
ROUND_ROBIN
下游扩容无影响
下游缩容:旧 subtask 的数据会依次轮询发送到新 subtask 上
/**
* Redistributes subtask state in a round robin fashion. Returns a mapping of {@code newIndex ->
* oldIndexes}. The mapping is accessed by using {@code Bitset oldIndexes =
* mapping.get(newIndex)}.
*
* <p>For {@code oldParallelism < newParallelism}, that mapping is trivial. For example if
* oldParallelism = 6 and newParallelism = 10.
*
* <table>
* <thead><td>New index</td><td>Old indexes</td></thead>
* <tr><td>0</td><td>0</td></tr>
* <tr><td>1</td><td>1</td></tr>
* <tr><td span="2" align="center">...</td></tr>
* <tr><td>5</td><td>5</td></tr>
* <tr><td>6</td><td></td></tr>
* <tr><td span="2" align="center">...</td></tr>
* <tr><td>9</td><td></td></tr>
* </table>
*
* <p>For {@code oldParallelism > newParallelism}, new indexes get multiple assignments by
* wrapping around assignments in a round-robin fashion. For example if oldParallelism = 10 and
* newParallelism = 4.
*
* <table>
* <thead><td>New index</td><td>Old indexes</td></thead>
* <tr><td>0</td><td>0, 4, 8</td></tr>
* <tr><td>1</td><td>1, 5, 9</td></tr>
* <tr><td>2</td><td>2, 6</td></tr>
* <tr><td>3</td><td>3, 7</td></tr>
* </table>
*/
ROUND_ROBIN {
@Override
public int[] getOldSubtasks(
int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks) {
final IntArrayList subtasks =
new IntArrayList(oldNumberOfSubtasks / newNumberOfSubtasks + 1);
for (int subtask = newSubtaskIndex;
subtask < oldNumberOfSubtasks;
subtask += newNumberOfSubtasks) {
subtasks.add(subtask);
}
return subtasks.toArray();
}
},
FIRST
所有旧 subtask 的数据均发送到第1个新 subtask 上
如果新 subtask 的索引为0,则返回所有旧 subtask 的索引集合
如果新 subtask 的索引不为0,则直接返回 EMPTY,即 int[0] 空数组
/** Restores extra subtasks to the first subtask. */
FIRST {
@Override
public int[] getOldSubtasks(
int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks) {
return newSubtaskIndex == 0 ? IntStream.range(0, oldNumberOfSubtasks).toArray() : EMPTY;
}
},
FULL
所有旧 subtask 的数据均发到每个新 subtask 上
/**
* Replicates the state to all subtasks. This rescaling causes a huge overhead and completely
* relies on filtering the data downstream.
*
* <p>This strategy should only be used as a fallback.
*/
FULL {
@Override
public int[] getOldSubtasks(
int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks) {
return IntStream.range(0, oldNumberOfSubtasks).toArray();
}
@Override
public boolean isAmbiguous() {
return true;
}
},
RANGE
/**
* Remaps old ranges to new ranges. For minor rescaling that means that new subtasks are mostly
* assigned 2 old subtasks.
*
* <p>Example:<br>
* old assignment: 0 -> [0;43); 1 -> [43;87); 2 -> [87;128)<br>
* new assignment: 0 -> [0;64]; 1 -> [64;128)<br>
* subtask 0 recovers data from old subtask 0 + 1 and subtask 1 recovers data from old subtask 1
* + 2
*
* <p>For all downscale from n to [n-1 .. n/2], each new subtasks get exactly two old subtasks
* assigned.
*
* <p>For all upscale from n to [n+1 .. 2*n-1], most subtasks get two old subtasks assigned,
* except the two outermost.
*
* <p>Larger scale factors ({@code <n/2}, {@code >2*n}), will increase the number of old
* subtasks accordingly. However, they will also create more unique assignment, where an old
* subtask is exclusively assigned to a new subtask. Thus, the number of non-unique mappings is
* upper bound by 2*n.
*/
RANGE {
@Override
public int[] getOldSubtasks(
int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks) {
// 定义最大并行度 1<<15=32768
int maxParallelism = KeyGroupRangeAssignment.UPPER_BOUND_MAX_PARALLELISM;
// 计算出新 subtask 对应的 KeyGroupRange
final KeyGroupRange newRange =
KeyGroupRangeAssignment.computeKeyGroupRangeForOperatorIndex(
maxParallelism, newNumberOfSubtasks, newSubtaskIndex);
// 计算 newRange 对应的旧 subtask 索引集合的起点
final int start =
KeyGroupRangeAssignment.computeOperatorIndexForKeyGroup(
maxParallelism, oldNumberOfSubtasks, newRange.getStartKeyGroup());
// 计算 newRange 对应的旧 subtask 索引集合的终点
final int end =
KeyGroupRangeAssignment.computeOperatorIndexForKeyGroup(
maxParallelism, oldNumberOfSubtasks, newRange.getEndKeyGroup());
// 返回结果
return IntStream.range(start, end + 1).toArray();
}
@Override
public boolean isAmbiguous() {
return true;
}
},
KeyGroupRange 计算方法
本质是将 maxParallelism 分成 parallelism 份
org.apache.flink.runtime.state.KeyGroupRangeAssignment#computeKeyGroupRangeForOperatorIndex
int start = ((operatorIndex * maxParallelism + parallelism - 1) / parallelism);
int end = ((operatorIndex + 1) * maxParallelism - 1) / parallelism;
return new KeyGroupRange(start, end);
计算 subtask 索引集合
org.apache.flink.runtime.state.KeyGroupRangeAssignment#computeOperatorIndexForKeyGroup
return keyGroupId * parallelism / maxParallelism;
UNSUPPORTED
直接抛出异常
UNSUPPORTED {
@Override
public int[] getOldSubtasks(
int newSubtaskIndex, int oldNumberOfSubtasks, int newNumberOfSubtasks) {
throw new UnsupportedOperationException(
"Cannot rescale the given pointwise partitioner.\n"
+ "Did you change the partitioner to forward or rescale?\n"
+ "It may also help to add an explicit shuffle().");
}
};
KeyGroupStreamPartitioner
Partitioner selects the target channel based on the key group index.
KeyGroupStreamPartitioner 通过两次 hash(murmurHash(key.hashCode()))再按照最大并行度(默认128)取模生成 keyGroupId,最后根据 keyGroupId * parallelism / maxParallelism 得出下游分区 index,作为数据分发的依据。
/**
* Partitioner selects the target channel based on the key group index.
*
* @param <T> Type of the elements in the Stream being partitioned
*/
@Internal
public class KeyGroupStreamPartitioner<T, K> extends StreamPartitioner<T>
implements ConfigurableStreamPartitioner {
private static final long serialVersionUID = 1L;
private final KeySelector<T, K> keySelector;
private int maxParallelism;
public KeyGroupStreamPartitioner(KeySelector<T, K> keySelector, int maxParallelism) {
Preconditions.checkArgument(maxParallelism > 0, "Number of key-groups must be > 0!");
this.keySelector = Preconditions.checkNotNull(keySelector);
this.maxParallelism = maxParallelism;
}
public int getMaxParallelism() {
return maxParallelism;
}
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
K key;
try {
// 从事件中获取 key
key = keySelector.getKey(record.getInstance().getValue());
} catch (Exception e) {
throw new RuntimeException(
"Could not extract key from " + record.getInstance().getValue(), e);
}
// 返回 key group id
return KeyGroupRangeAssignment.assignKeyToParallelOperator(
key, maxParallelism, numberOfChannels);
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.RANGE;
}
@Override
public boolean isPointwise() {
return false;
}
@Override
public String toString() {
return "HASH";
}
...
}
org.apache.flink.runtime.state.KeyGroupRangeAssignment#assignKeyToParallelOperator
return computeOperatorIndexForKeyGroup(
maxParallelism, parallelism, assignToKeyGroup(key, maxParallelism));
org.apache.flink.runtime.state.KeyGroupRangeAssignment#assignToKeyGroup
// 获取 key 的 hash 值
return computeKeyGroupForKeyHash(key.hashCode(), maxParallelism);
org.apache.flink.runtime.state.KeyGroupRangeAssignment#computeKeyGroupForKeyHash
// 取 murmurHash 再与 最大并行度取模
return MathUtils.murmurHash(keyHash) % maxParallelism;
// 取完两次 hash 后回到 computeOperatorIndexForKeyGroup 方法计算 key group id
org.apache.flink.runtime.state.KeyGroupRangeAssignment#computeOperatorIndexForKeyGroup
return keyGroupId * parallelism / maxParallelism;
RebalancePartitioner
rebalance() 算子是真正意义上的轮询操作,上游数据轮询下发到下游算子。
/**
* Partitioner that distributes the data equally by cycling through the output channels.
*
* @param <T> Type of the elements in the Stream being rebalanced
*/
@Internal
public class RebalancePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private int nextChannelToSendTo;
// 下游channel选择器,第一个数据是随机选择下游其中一个channel
@Override
public void setup(int numberOfChannels) {
super.setup(numberOfChannels);
nextChannelToSendTo = ThreadLocalRandom.current().nextInt(numberOfChannels);
}
// 后续 +1 取模的方式开始轮询下发
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
return nextChannelToSendTo;
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.ROUND_ROBIN;
}
public StreamPartitioner<T> copy() {
return this;
}
@Override
public boolean isPointwise() {
return false;
}
@Override
public String toString() {
return "REBALANCE";
}
}
RescalePartitioner
rescale 的上下游交互取决于他们的并行度,上游为 2 下游为 4,则一个上游对应两个下游,上游为 4 下游为 2,则两个上游对应一个下游。如若是不同倍数的并行度,则下游会有不同数量的输入。
● 区别于 rebalance 有两点,轮询从下游第一个分区开始以及是点对点分发模式。
● rescale 可以增加数据本地处理,减少了网络io性能更高,但数据均衡性不如rebalance。
/**
* Partitioner that distributes the data equally by cycling through the output channels. This
* distributes only to a subset of downstream nodes because {@link
* org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator} instantiates a {@link
* DistributionPattern#POINTWISE} distribution pattern when encountering {@code RescalePartitioner}.
*
* <p>The subset of downstream operations to which the upstream operation sends elements depends on
* the degree of parallelism of both the upstream and downstream operation. For example, if the
* upstream operation has parallelism 2 and the downstream operation has parallelism 4, then one
* upstream operation would distribute elements to two downstream operations while the other
* upstream operation would distribute to the other two downstream operations. If, on the other
* hand, the downstream operation has parallelism 2 while the upstream operation has parallelism 4
* then two upstream operations will distribute to one downstream operation while the other two
* upstream operations will distribute to the other downstream operations.
*
* <p>In cases where the different parallelisms are not multiples of each other one or several
* downstream operations will have a differing number of inputs from upstream operations.
*
* @param <T> Type of the elements in the Stream being rescaled
*/
@Internal
public class RescalePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private int nextChannelToSendTo = -1;
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
if (++nextChannelToSendTo >= numberOfChannels) {
nextChannelToSendTo = 0;
}
return nextChannelToSendTo;
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.UNSUPPORTED;
}
@Override
public SubtaskStateMapper getUpstreamSubtaskStateMapper() {
return SubtaskStateMapper.UNSUPPORTED;
}
public StreamPartitioner<T> copy() {
return this;
}
@Override
public String toString() {
return "RESCALE";
}
@Override
public boolean isPointwise() {
return true;
}
}
GlobalPartitioner
将上游数据发送到下游 subtask ID= 0 的分区
/**
* Partitioner that sends all elements to the downstream operator with subtask ID=0.
*
* @param <T> Type of the elements in the Stream being partitioned
*/
@Internal
public class GlobalPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
return 0;
}
@Override
public StreamPartitioner<T> copy() {
return this;
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.FIRST;
}
@Override
public boolean isPointwise() {
return false;
}
@Override
public String toString() {
return "GLOBAL";
}
}
ShufflePartitioner
shuffle() 算子按 Random() 方法随机选择下游分区
/**
* Partitioner that distributes the data equally by selecting one output channel randomly.
*
* @param <T> Type of the Tuple
*/
@Internal
public class ShufflePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private Random random = new Random();
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
return random.nextInt(numberOfChannels);
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.ROUND_ROBIN;
}
@Override
public StreamPartitioner<T> copy() {
return new ShufflePartitioner<T>();
}
@Override
public boolean isPointwise() {
return false;
}
@Override
public String toString() {
return "SHUFFLE";
}
}
ForwardPartitioner
仅将元素转发到本地运行的下游操作的分区器,有效避免网络传输
/**
* Partitioner that forwards elements only to the locally running downstream operation.
*
* @param <T> Type of the elements in the Stream
*/
@Internal
public class ForwardPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
return 0;
}
public StreamPartitioner<T> copy() {
return this;
}
@Override
public boolean isPointwise() {
return true;
}
@Override
public String toString() {
return "FORWARD";
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.UNSUPPORTED;
}
@Override
public SubtaskStateMapper getUpstreamSubtaskStateMapper() {
return SubtaskStateMapper.UNSUPPORTED;
}
}
BroadcastPartitioner
上游数据会分发给下游所有分区,下游实例都完整保存一份上游算子的数据,之后可以直接从本地获取数据。
/**
* Partitioner that selects all the output channels.
*
* @param <T> Type of the elements in the Stream being broadcast
*/
@Internal
public class BroadcastPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
/**
* Note: Broadcast mode could be handled directly for all the output channels in record writer,
* so it is no need to select channels via this method.
*/
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
throw new UnsupportedOperationException(
"Broadcast partitioner does not support select channels.");
}
@Override
public SubtaskStateMapper getUpstreamSubtaskStateMapper() {
return SubtaskStateMapper.UNSUPPORTED;
}
@Override
public SubtaskStateMapper getDownstreamSubtaskStateMapper() {
return SubtaskStateMapper.UNSUPPORTED;
}
@Override
public boolean isBroadcast() {
return true;
}
@Override
public StreamPartitioner<T> copy() {
return this;
}
@Override
public boolean isPointwise() {
return false;
}
@Override
public String toString() {
return "BROADCAST";
}
}
CustomPartitionerWrapper
/**
* Partitions a DataStream on the key returned by the selector, using a custom partitioner. This
* method takes the key selector to get the key to partition on, and a partitioner that accepts
* the key type.
*
* <p>Note: This method works only on single field keys, i.e. the selector cannot return tuples
* of fields.
*
* @param partitioner The partitioner to assign partitions to keys.
* @param keySelector The KeySelector with which the DataStream is partitioned.
* @return The partitioned DataStream.
* @see KeySelector
*/
public <K> DataStream<T> partitionCustom(
Partitioner<K> partitioner, KeySelector<T, K> keySelector) {
return setConnectionType(
new CustomPartitionerWrapper<>(clean(partitioner), clean(keySelector)));
}
// private helper method for custom partitioning
private <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, Keys<T> keys) {
KeySelector<T, K> keySelector =
KeySelectorUtil.getSelectorForOneKey(
keys, partitioner, getType(), getExecutionConfig());
return setConnectionType(
new CustomPartitionerWrapper<>(clean(partitioner), clean(keySelector)));
}