文章目录
Window 是 Flink 处理无限流的核心,Window 将无限事件流划分为有限大小的桶,基于这个桶我们可以做各种计算。
窗口的几要术:
1.窗口分配器 window assigner
2.触发器 trigger
3.驱逐器 evictor
每条进入窗口的元素都会交由 WindowAssigner 处理,WindowAssigner 会决定元素被分到那个或那些窗口,窗口只是一个 ID 标识,并不存储窗口中的元素,实际存储数据的是 state。
每个窗口都有一个 Trigger 用来决定窗口何时被触发或清理。
窗口触发后,会经过 Evictor 过滤。
过滤后的元素发送到下游 function 计算
Window Assigner
窗口分配器主要分为三类:
1.全局窗口分配器
2.滚动窗口分配器
3.滑动窗口分配器
WindowAssigner 主要功能
窗口分配方法 assignWindows
窗口触发方法 getDefaultTrigger
窗口分配器上下文信息 WindowAssignerContext
窗口触发器 Trigger
org.apache.flink.streaming.api.windowing.triggers.TriggerResult
CONTINUE 不做任何事情
FIRE 触发 window
PURGE 清空整个 window 的元素并销毁窗口
FIRE_AND_PURGE 触发窗口,然后销毁窗口
Evictor
Evictor:可以译为“驱逐者”。在Trigger触发之后,在窗口被处理之前,Evictor(如果有Evictor的话)会用来剔除窗口中不需要的元素,相当于一个filte
Tumbling Windows(滚动窗口)
滚动窗口分配器将每个元素分配给指定窗口大小的窗口。翻滚窗口具有固定大小并且不会重叠。例如,如果指定一个大小为 5 分钟的滚动窗口,则将评估当前窗口并每五分钟启动一个新窗口,如下图所示。
public class TumblingWindowsBaseEventTime {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
/*
Tumbling Windows 滚动窗口
1.窗口不重叠
2.触发窗口计算,并销毁窗口中的元素
*/
// 从 socket 读取数据源
DataStreamSource<String> socketTextStream = env.socketTextStream("10.199.241.213", 9099);
// 将数据源转换成 Tuple2 结构
SingleOutputStreamOperator<Tuple2<String, Long>> mapStream = socketTextStream.map(new MapFunction<String, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(String val) throws Exception {
String[] elems = val.split(" ");
return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
}
});
// 抽取 watermark
SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = mapStream.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0L))
.withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
@Override
public long extractTimestamp(Tuple2<String, Long> elems, long l) {
return elems.f1;
}
})
);
// 按第一个元素分组
KeyedStream<Tuple2<String, Long>, String> keyedStream = withWatermarkStream.keyBy(r -> r.f0);
// 开一个 5 s 的滚动窗口
WindowedStream<Tuple2<String, Long>, String, TimeWindow> timeWindowWindowedStream = keyedStream.window(TumblingEventTimeWindows.of(Time.seconds(5L)));
// 统计窗口中的元素
SingleOutputStreamOperator<String> countWinElemsStream = timeWindowWindowedStream.process(
new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
@Override
public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {
System.out.println("当前 watermark:" + context.currentWatermark() + " -> " + new Timestamp(context.currentWatermark()) + " :::: 当前时间:" + new Timestamp(System.currentTimeMillis()) );
Timestamp winStart = new Timestamp(context.window().getStart());
Timestamp winEnd = new Timestamp(context.window().getEnd());
long cnt = elements.spliterator().getExactSizeIfKnown();
out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + "条元素");
}
}
);
countWinElemsStream.print();
env.execute();
}
}
[hdfs@hdfs03 ~]$ nc -lk 9099
a 1
a 5
> 当前 watermark:4999 -> 1970-01-01 08:00:04.999 :::: 当前时间:2023-02-20 18:20:08.86
> key = a window [ 1970-01-01 08:00:00.0 - 1970-01-01 08:00:05.0 ) 有 1条元素
a 4 迟到数据
a 7
a 9
a 8
a 10
> 当前 watermark:9999 -> 1970-01-01 08:00:09.999 :::: 当前时间:2023-02-20 18:20:24.033
> key = a window [ 1970-01-01 08:00:05.0 - 1970-01-01 08:00:10.0 ) 有 4条元素
a 11
a 15
> 当前 watermark:14999 -> 1970-01-01 08:00:14.999 :::: 当前时间:2023-02-20 18:20:31.526
> key = a window [ 1970-01-01 08:00:10.0 - 1970-01-01 08:00:15.0 ) 有 2条元素
public class TumblingWindowsBaseProcessingTime {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
/*
Tumbling Windows 滚动窗口
1.窗口不重叠
2.触发窗口计算,并销毁窗口中的元素
*/
// 从 socket 读取数据源
DataStreamSource<String> socketTextStream = env.socketTextStream("10.199.241.213", 9099);
// 将数据源转换成 Tuple2 结构
SingleOutputStreamOperator<Tuple2<String, Long>> mapStream = socketTextStream.map(new MapFunction<String, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(String val) throws Exception {
String[] elems = val.split(" ");
return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
}
});
// 抽取 watermark
SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = mapStream.assignTimestampsAndWatermarks(
WatermarkStrategy.forMonotonousTimestamps()
);
// 按第一个元素分组
KeyedStream<Tuple2<String, Long>, String> keyedStream = withWatermarkStream.keyBy(r -> r.f0);
// 开一个 5 s 的滚动窗口
WindowedStream<Tuple2<String, Long>, String, TimeWindow> timeWindowWindowedStream = keyedStream.window(TumblingProcessingTimeWindows.of(Time.seconds(5L)));
// 统计窗口中的元素
SingleOutputStreamOperator<String> countWinElemsStream = timeWindowWindowedStream.process(
new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
@Override
public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {
System.out.println("当前 watermark:" + context.currentWatermark() + " :::: 当前时间:" + new Timestamp(System.currentTimeMillis()) );
Timestamp winStart = new Timestamp(context.window().getStart());
Timestamp winEnd = new Timestamp(context.window().getEnd());
long cnt = elements.spliterator().getExactSizeIfKnown();
out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + "条元素");
}
}
);
countWinElemsStream.print();
env.execute();
}
}
[hdfs@hdfs03 ~]$ nc -lk 9099
a 1
a 5
a 3
> 上面 3 条数据 5秒内输入,然后等待窗口触发
> 当前 watermark:-9223372036854775808 :::: 当前时间:2023-02-20 18:31:35.017
> key = a window [ 2023-02-20 18:31:30.0 - 2023-02-20 18:31:35.0 ) 有 3条元素
a 10
> 等待 5 s,触发窗口
> 当前 watermark:-9223372036854775808 :::: 当前时间:2023-02-20 18:31:40.001
> key = a window [ 2023-02-20 18:31:35.0 - 2023-02-20 18:31:40.0 ) 有 1条元素
a 2
> 等待 5 s,触发窗口
> 当前 watermark:-9223372036854775808 :::: 当前时间:2023-02-20 18:31:45.007
> key = a window [ 2023-02-20 18:31:40.0 - 2023-02-20 18:31:45.0 ) 有 1条元素
滚动事件时间窗口分配器 TumblingEventTimeWindows
# org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
protected TumblingEventTimeWindows(long size, long offset, WindowStagger windowStagger) {
if (Math.abs(offset) >= size) {
throw new IllegalArgumentException(
"TumblingEventTimeWindows parameters must satisfy abs(offset) < size");
}
this.size = size;
this.globalOffset = offset;
this.windowStagger = windowStagger;
}
分配窗口
public Collection<TimeWindow> assignWindows(
Object element, long timestamp, WindowAssignerContext context) {
if (timestamp > Long.MIN_VALUE) {
if (staggerOffset == null) {
staggerOffset =
windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);
}
// Long.MIN_VALUE is currently assigned when no timestamp is present
long start =
TimeWindow.getWindowStartWithOffset(
timestamp, (globalOffset + staggerOffset) % size, size);
return Collections.singletonList(new TimeWindow(start, start + size));
} else {
throw new RuntimeException(
"Record has Long.MIN_VALUE timestamp (= no timestamp marker). "
+ "Is the time characteristic set to 'ProcessingTime', or did you forget to call "
+ "'DataStream.assignTimestampsAndWatermarks(...)'?");
}
}
窗口触发器
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return EventTimeTrigger.create();
}
org.apache.flink.streaming.api.windowing.triggers.EventTimeTrigger
// 每条数据来都会执行该方法
public TriggerResult onElement(
Object element, long timestamp, TimeWindow window, TriggerContext ctx)
throws Exception {
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
}
// 时间时间窗口触发规则
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
}
TumblingProcessingTimeWindows
# org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows
创建窗口
private TumblingProcessingTimeWindows(long size, long offset, WindowStagger windowStagger) {
if (Math.abs(offset) >= size) {
throw new IllegalArgumentException(
"TumblingProcessingTimeWindows parameters must satisfy abs(offset) < size");
}
this.size = size;
this.globalOffset = offset;
this.windowStagger = windowStagger;
}
// 分配窗口规则
public Collection<TimeWindow> assignWindows(
Object element, long timestamp, WindowAssignerContext context) {
final long now = context.getCurrentProcessingTime();
if (staggerOffset == null) {
staggerOffset =
windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);
}
long start =
TimeWindow.getWindowStartWithOffset(
now, (globalOffset + staggerOffset) % size, size);
return Collections.singletonList(new TimeWindow(start, start + size));
}
// 窗口触发器
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return ProcessingTimeTrigger.create();
}
org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger
public TriggerResult onElement(
Object element, long timestamp, TimeWindow window, TriggerContext ctx) {
ctx.registerProcessingTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
@Override
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) {
return TriggerResult.FIRE;
}
Sliding Windows(滑动窗口)
滑动窗口分配器将元素分配给固定长度的窗口。与滚动窗口分配器类似,窗口的大小由窗口大小参数配置。一个附加的窗口滑动参数控制滑动窗口启动的频率。因此,如果幻灯片小于窗口大小,则滑动窗口可以重叠。在这种情况下,元素被分配给多个窗口。
例如,您可以让大小为 10 分钟的窗口按 5 分钟滑动。这样,您每 5 分钟就会获得一个窗口,其中包含在过去 10 分钟内到达的事件,如下图所示。
public class SlidingWindowsBaseEventTime {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
/*
Sliding Windows 滚动窗口
1.窗口重叠
2.每隔滑动步长秒触发窗口计算,但会保留窗口部分元素
3.下边界 移动到 上边界 是清空下边界的所有元素
*/
SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = env
.socketTextStream("10.199.241.213", 9099)
.map(new MapFunction<String, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(String val) throws Exception {
String[] elems = val.split(" ");
return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
}
})
.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
@Override
public long extractTimestamp(Tuple2<String, Long> elem, long l) {
return elem.f1;
}
})
);
SingleOutputStreamOperator<String> countWinProcessStream = withWatermarkStream
.keyBy(r -> r.f0)
// 窗口大小为 5 s,窗口滑动步长为 3 s
.window(SlidingEventTimeWindows.of(Time.seconds(5), Time.seconds(3)))
.process(new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
@Override
public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {
System.out.println("当前 watermark = " + context.currentWatermark() + " --> " + new Timestamp(context.currentWatermark()) + " ::: current time = " + new Timestamp(System.currentTimeMillis()));
Timestamp winStart = new Timestamp(context.window().getStart());
Timestamp winEnd = new Timestamp(context.window().getEnd());
long cnt = elements.spliterator().getExactSizeIfKnown();
out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + " 条元素");
}
});
countWinProcessStream.print();
env.execute();
}
}
public class SlidingWindowsBaseProcessingTime {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
/*
Sliding Windows 滚动窗口
1.窗口重叠
2.每隔滑动步长秒触发窗口计算,但会保留窗口部分元素
3.下边界 移动到 上边界 是清空下边界的所有元素
*/
SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = env
.socketTextStream("10.199.241.213", 9099)
.map(new MapFunction<String, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(String val) throws Exception {
String[] elems = val.split(" ");
return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
}
})
.assignTimestampsAndWatermarks(
WatermarkStrategy.forMonotonousTimestamps()
);
SingleOutputStreamOperator<String> countWinProcessStream = withWatermarkStream
.keyBy(r -> r.f0)
// 窗口大小为 5 s,窗口滑动步长为 3 s
.window(SlidingProcessingTimeWindows.of(Time.seconds(5), Time.seconds(3)))
.process(new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
@Override
public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {
System.out.println("当前 watermark = " + context.currentWatermark() + " --> " + new Timestamp(context.currentWatermark()) + " ::: current time = " + new Timestamp(System.currentTimeMillis()));
Timestamp winStart = new Timestamp(context.window().getStart());
Timestamp winEnd = new Timestamp(context.window().getEnd());
long cnt = elements.spliterator().getExactSizeIfKnown();
out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + " 条元素");
}
});
countWinProcessStream.print();
env.execute();
}
}
SlidingEventTimeWindows
# org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows
// 创建窗口
protected SlidingEventTimeWindows(long size, long slide, long offset) {
if (Math.abs(offset) >= slide || size <= 0) {
throw new IllegalArgumentException(
"SlidingEventTimeWindows parameters must satisfy "
+ "abs(offset) < slide and size > 0");
}
this.size = size;
this.slide = slide;
this.offset = offset;
}
// 分配窗口
public Collection<TimeWindow> assignWindows(
Object element, long timestamp, WindowAssignerContext context) {
if (timestamp > Long.MIN_VALUE) {
List<TimeWindow> windows = new ArrayList<>((int) (size / slide));
long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, offset, slide);
for (long start = lastStart; start > timestamp - size; start -= slide) {
windows.add(new TimeWindow(start, start + size));
}
return windows;
} else {
throw new RuntimeException(
"Record has Long.MIN_VALUE timestamp (= no timestamp marker). "
+ "Is the time characteristic set to 'ProcessingTime', or did you forget to call "
+ "'DataStream.assignTimestampsAndWatermarks(...)'?");
}
}
// 触发器
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return EventTimeTrigger.create();
}
org.apache.flink.streaming.api.windowing.triggers.EventTimeTrigger
public TriggerResult onElement(
Object element, long timestamp, TimeWindow window, TriggerContext ctx)
throws Exception {
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
}
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
}
SlidingProcessingTimeWindows
# org.apache.flink.streaming.api.windowing.assigners.SlidingProcessingTimeWindows
// 初始化
private SlidingProcessingTimeWindows(long size, long slide, long offset) {
if (Math.abs(offset) >= slide || size <= 0) {
throw new IllegalArgumentException(
"SlidingProcessingTimeWindows parameters must satisfy "
+ "abs(offset) < slide and size > 0");
}
this.size = size;
this.slide = slide;
this.offset = offset;
}
// 分配窗口
public Collection<TimeWindow> assignWindows(
Object element, long timestamp, WindowAssignerContext context) {
timestamp = context.getCurrentProcessingTime();
List<TimeWindow> windows = new ArrayList<>((int) (size / slide));
long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, offset, slide);
for (long start = lastStart; start > timestamp - size; start -= slide) {
windows.add(new TimeWindow(start, start + size));
}
return windows;
}
// 触发器
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return ProcessingTimeTrigger.create();
}
org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger
public TriggerResult onElement(
Object element, long timestamp, TimeWindow window, TriggerContext ctx) {
ctx.registerProcessingTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) {
return TriggerResult.FIRE;
}
Global Windows
全局窗口的 assigner 将拥有相同 key 的所有数据分发到一个全局窗口。 这样的窗口模式仅在你指定了自定义的 trigger 时有用。 否则,计算不会发生,因为全局窗口没有天然的终点去触发其中积累的数据。
以滑动计数窗口为例
public class CountWindowsBaseEventTime {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
/*
Count Window 计数窗口
窗口指定条数触发窗口计算,并清除前滑动步长个元素,滚动窗口步长等于窗口大小
*/
SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = env
.socketTextStream("10.199.241.213", 9099)
.map(new MapFunction<String, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(String val) throws Exception {
String[] elems = val.split(" ");
return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
}
})
.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
@Override
public long extractTimestamp(Tuple2<String, Long> elem, long l) {
return elem.f1;
}
})
);
SingleOutputStreamOperator<String> countWinProcessStream = withWatermarkStream
.keyBy(r -> r.f0)
.countWindow(10, 5)
.process(new ProcessWindowFunction<Tuple2<String, Long>, String, String, GlobalWindow>() {
@Override
public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {
System.out.println("当前 watermark = " + context.currentWatermark() + " --> " + new Timestamp(context.currentWatermark()) + " ::: current time = " + new Timestamp(System.currentTimeMillis()));
Timestamp win = new Timestamp(context.currentWatermark());
long cnt = elements.spliterator().getExactSizeIfKnown();
out.collect("key = " + key + "\t window [ " + win + " ) 有 " + cnt + " 条元素");
}
});
env.execute();
}
}
GlobalWindows
# org.apache.flink.streaming.api.windowing.assigners.GlobalWindows
分配窗口
public Collection<GlobalWindow> assignWindows(
Object element, long timestamp, WindowAssignerContext context) {
return Collections.singletonList(GlobalWindow.get());
}
触发器
public Trigger<Object, GlobalWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return new NeverTrigger();
}
CountEvictor
# org.apache.flink.streaming.api.windowing.evictors.CountEvictor
驱逐器
@Override
public void evictBefore(
Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
if (!doEvictAfter) {
evict(elements, size, ctx);
}
}
@Override
public void evictAfter(
Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
if (doEvictAfter) {
evict(elements, size, ctx);
}
}
private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) {
if (size <= maxCount) {
return;
} else {
int evictedCount = 0;
for (Iterator<TimestampedValue<Object>> iterator = elements.iterator();
iterator.hasNext(); ) {
iterator.next();
evictedCount++;
if (evictedCount > size - maxCount) {
break;
} else {
iterator.remove();
}
}
}
}
CountTrigger
# org.apache.flink.streaming.api.windowing.triggers.CountTrigger
触发器
@Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx)
throws Exception {
ReducingState<Long> count = ctx.getPartitionedState(stateDesc);
count.add(1L);
if (count.get() >= maxCount) {
count.clear();
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
}
@Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) {
return TriggerResult.CONTINUE;
}
@Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx)
throws Exception {
return TriggerResult.CONTINUE;
}
Session Windows
会话窗口的 assigner 会把数据按活跃的会话分组。 与滚动窗口和滑动窗口不同,会话窗口不会相互重叠,且没有固定的开始或结束时间。 会话窗口在一段时间没有收到数据之后会关闭,即在一段不活跃的间隔之后。 会话窗口的 assigner 可以设置固定的会话间隔(session gap)或 用 session gap extractor 函数来动态地定义多长时间算作不活跃。 当超出了不活跃的时间段,当前的会话就会关闭,并且将接下来的数据分发到新的会话窗口。
public class SessionWindows {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
/*
Session Windows 会话窗口
1.指定时间内没有数据到来触发窗口计算并清空窗口中元素
nc -lk 9099
q
w
e
r
t
y
[waiting at least 5s]
1
w
e
r
t
...
*/
env
.socketTextStream("127.0.0.1", 9099)
.keyBy(event -> 1)
// 5s 没有数据到来就触发窗口
.window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
.process(
// ProcessWindowFunction<IN, OUT, KEY, W extends Window>
new ProcessWindowFunction<String, String, Integer, TimeWindow>() {
@Override
public void process(Integer key, Context context, Iterable<String> elements, Collector<String> out) throws Exception {
Timestamp winStart = new Timestamp(context.window().getStart());
Timestamp winEnd = new Timestamp(context.window().getEnd());
int count = 0;
Iterator<String> iterator = elements.iterator();
while (iterator.hasNext()) {
iterator.next();
count++;
}
String result = String.format("window [ %s - %s ] 窗口中有 %d 条元素", winStart, winEnd, count);
out.collect(result);
}
})
.print();
env.execute();
}
}
ProcessingTimeSessionWindows
# org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows
// 分配窗口
@Override
public Collection<TimeWindow> assignWindows(
Object element, long timestamp, WindowAssignerContext context) {
long currentProcessingTime = context.getCurrentProcessingTime();
return Collections.singletonList(
new TimeWindow(currentProcessingTime, currentProcessingTime + sessionTimeout));
}
// 触发器
@Override
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return ProcessingTimeTrigger.create();
}
// 合并窗口
public static <T> DynamicProcessingTimeSessionWindows<T> withDynamicGap(
SessionWindowTimeGapExtractor<T> sessionWindowTimeGapExtractor) {
return new DynamicProcessingTimeSessionWindows<>(sessionWindowTimeGapExtractor);
}
DynamicProcessingTimeSessionWindows
# org.apache.flink.streaming.api.windowing.assigners.DynamicProcessingTimeSessionWindows
// 分配窗口
public Collection<TimeWindow> assignWindows(
T element, long timestamp, WindowAssignerContext context) {
long currentProcessingTime = context.getCurrentProcessingTime();
long sessionTimeout = sessionWindowTimeGapExtractor.extract(element);
if (sessionTimeout <= 0) {
throw new IllegalArgumentException("Dynamic session time gap must satisfy 0 < gap");
}
return Collections.singletonList(
new TimeWindow(currentProcessingTime, currentProcessingTime + sessionTimeout));
}
// 触发器
public Trigger<T, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return (Trigger<T, TimeWindow>) ProcessingTimeTrigger.create();
}
// 合并窗口
public void mergeWindows(Collection<TimeWindow> windows, MergeCallback<TimeWindow> c) {
TimeWindow.mergeWindows(windows, c);
}
org.apache.flink.streaming.api.windowing.windows.TimeWindow#mergeWindows
public static void mergeWindows(
Collection<TimeWindow> windows, MergingWindowAssigner.MergeCallback<TimeWindow> c) {
// sort the windows by the start time and then merge overlapping windows
List<TimeWindow> sortedWindows = new ArrayList<>(windows);
Collections.sort(
sortedWindows,
new Comparator<TimeWindow>() {
@Override
public int compare(TimeWindow o1, TimeWindow o2) {
return Long.compare(o1.getStart(), o2.getStart());
}
});
List<Tuple2<TimeWindow, Set<TimeWindow>>> merged = new ArrayList<>();
Tuple2<TimeWindow, Set<TimeWindow>> currentMerge = null;
for (TimeWindow candidate : sortedWindows) {
if (currentMerge == null) {
currentMerge = new Tuple2<>();
currentMerge.f0 = candidate;
currentMerge.f1 = new HashSet<>();
currentMerge.f1.add(candidate);
} else if (currentMerge.f0.intersects(candidate)) {
currentMerge.f0 = currentMerge.f0.cover(candidate);
currentMerge.f1.add(candidate);
} else {
merged.add(currentMerge);
currentMerge = new Tuple2<>();
currentMerge.f0 = candidate;
currentMerge.f1 = new HashSet<>();
currentMerge.f1.add(candidate);
}
}
if (currentMerge != null) {
merged.add(currentMerge);
}
for (Tuple2<TimeWindow, Set<TimeWindow>> m : merged) {
if (m.f1.size() > 1) {
c.merge(m.f1, m.f0);
}
}
}
参考资料
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/windows/