flink waterMark window trigger之剖析
stream
.keyBy(...) <- keyed versus non-keyed windows
.window(...) <- required: "assigner"
[.trigger(...)] <- optional: "trigger" (else default trigger)
[.evictor(...)] <- optional: "evictor" (else no evictor)
[.allowedLateness(...)] <- optional: "lateness" (else zero)
[.sideOutputLateData(...)] <- optional: "output tag" (else no side output for late data)
.reduce/aggregate/fold/apply() <- required: "function"
[.getSideOutput(...)] <- optional: "output tag"
简单的使用一个窗口功能 即 每隔10s统计最近60s的工况上传频率,来说明水位线,基于工况采集时间的事件的执行流程
1. 代码模块
DataStream<DeviResult> aggDs = filterds
.assignTimestampsAndWatermarks(WatermarkStrategy.
<DeviPOJO>forBoundedOutOfOrderness(Duration.ofSeconds(1)).
withTimestampAssigner(new SerializableTimestampAssigner<DeviPOJO>() {
@Override
public long extractTimestamp(DeviPOJO o, long l) {
return o.eventTimestamp;
}
}
))
.keyBy(x -> x.deviceID)
.window(SlidingEventTimeWindows.of(Time.seconds(60), Time.seconds(10)))
.allowedLateness(Time.seconds(0))
.process(new ProcessWindowFunction<DeviPOJO,DeviResult,String,TimeWindow>(){
@Override
public void process(String s, Context context, Iterable<DeviPOJO> iterable, Collector<DeviResult> collector) throws Exception {
Long count=0L;
for(DeviPOJO d:iterable){
count++;
}
collector.collect(new DeviResult(s,count,String.valueOf(context.window().getEnd()),context.window().maxTimestamp()));
}
});
2. 执行流程
构造waterMark
timestampAssigner.extractTimestamp 抽取去最新工况的eventTimestamp
获取对比eventTimestamp和maxTimestamp的最大值最为最新的maxTimestamp
创建waterMark的时间时间戳等于maxTimestamp-maxOutOfOrderness(允许延迟的时间)-1
创建trigger
每个SlidingEventTimeWindows都有一个默认的trigger,基于事件时间的窗口的默认EventTimeTrigger
执行trigger WindowOperator
每一条数据都会触onElement时间即 EventTimeTrigger:: onElement
如果窗口的最大时间 小于 当前水位线 则返回Fire立即触发窗口的reduce/aggregate/ process的操作(针对设置了allowedLateness允许延迟销毁窗口时间),否则注册定时器
StreamTaskNetworkInput
EventTimeTrigger:: onEventTime
当StreamTask接收到waterMark的记录时,会直接触发onEventTime,如果当水位线大于窗口的最大时间返回FIRE立即触发后续的操作,否则返回CONTINUE,不做任何操作
触发用户自定义的ProcessWindowFunction