Flink API - Window

Window 是 Flink 处理无限流的核心,Window 将无限事件流划分为有限大小的桶,基于这个桶我们可以做各种计算。
窗口的几要术:
1.窗口分配器 window assigner
2.触发器 trigger
3.驱逐器 evictor
在这里插入图片描述
每条进入窗口的元素都会交由 WindowAssigner 处理,WindowAssigner 会决定元素被分到那个或那些窗口,窗口只是一个 ID 标识,并不存储窗口中的元素,实际存储数据的是 state。
每个窗口都有一个 Trigger 用来决定窗口何时被触发或清理。
窗口触发后,会经过 Evictor 过滤。
过滤后的元素发送到下游 function 计算

Window Assigner

在这里插入图片描述
窗口分配器主要分为三类:
1.全局窗口分配器
2.滚动窗口分配器
3.滑动窗口分配器
WindowAssigner 主要功能
窗口分配方法 assignWindows
窗口触发方法 getDefaultTrigger
窗口分配器上下文信息 WindowAssignerContext

窗口触发器 Trigger

org.apache.flink.streaming.api.windowing.triggers.TriggerResult
CONTINUE 不做任何事情
FIRE 触发 window
PURGE 清空整个 window 的元素并销毁窗口
FIRE_AND_PURGE 触发窗口,然后销毁窗口

Evictor

Evictor:可以译为“驱逐者”。在Trigger触发之后,在窗口被处理之前,Evictor(如果有Evictor的话)会用来剔除窗口中不需要的元素,相当于一个filte

Tumbling Windows(滚动窗口)

滚动窗口分配器将每个元素分配给指定窗口大小的窗口。翻滚窗口具有固定大小并且不会重叠。例如,如果指定一个大小为 5 分钟的滚动窗口,则将评估当前窗口并每五分钟启动一个新窗口,如下图所示。
在这里插入图片描述
在这里插入图片描述

public class TumblingWindowsBaseEventTime {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);


        /*
            Tumbling Windows 滚动窗口

                1.窗口不重叠
                2.触发窗口计算,并销毁窗口中的元素

         */


        // 从 socket 读取数据源
        DataStreamSource<String> socketTextStream = env.socketTextStream("10.199.241.213", 9099);

        // 将数据源转换成 Tuple2 结构
        SingleOutputStreamOperator<Tuple2<String, Long>> mapStream = socketTextStream.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String val) throws Exception {
                String[] elems = val.split(" ");
                return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
            }
        });

        // 抽取 watermark
        SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = mapStream.assignTimestampsAndWatermarks(
                WatermarkStrategy
                        .<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0L))
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple2<String, Long> elems, long l) {
                                return elems.f1;
                            }
                        })
        );

        // 按第一个元素分组
        KeyedStream<Tuple2<String, Long>, String> keyedStream = withWatermarkStream.keyBy(r -> r.f0);

        // 开一个 5 s 的滚动窗口
        WindowedStream<Tuple2<String, Long>, String, TimeWindow> timeWindowWindowedStream = keyedStream.window(TumblingEventTimeWindows.of(Time.seconds(5L)));

        // 统计窗口中的元素
        SingleOutputStreamOperator<String> countWinElemsStream = timeWindowWindowedStream.process(
                new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
                    @Override
                    public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {

                        System.out.println("当前 watermark:" + context.currentWatermark() + " -> " + new Timestamp(context.currentWatermark()) + " :::: 当前时间:" + new Timestamp(System.currentTimeMillis()) );
                        Timestamp winStart = new Timestamp(context.window().getStart());
                        Timestamp winEnd = new Timestamp(context.window().getEnd());

                        long cnt = elements.spliterator().getExactSizeIfKnown();

                        out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + "条元素");

                    }
                }
        );

        countWinElemsStream.print();

        env.execute();


    }

}

[hdfs@hdfs03 ~]$ nc -lk 9099
a 1
a 5
    > 当前 watermark:4999 -> 1970-01-01 08:00:04.999 :::: 当前时间:2023-02-20 18:20:08.86
	> key = a	 window [ 1970-01-01 08:00:00.0 - 1970-01-01 08:00:05.0 )1条元素
a 4 迟到数据
a 7
a 9
a 8
a 10
    > 当前 watermark:9999 -> 1970-01-01 08:00:09.999 :::: 当前时间:2023-02-20 18:20:24.033
    > key = a	 window [ 1970-01-01 08:00:05.0 - 1970-01-01 08:00:10.0 )4条元素
a 11
a 15
    > 当前 watermark:14999 -> 1970-01-01 08:00:14.999 :::: 当前时间:2023-02-20 18:20:31.526
    > key = a	 window [ 1970-01-01 08:00:10.0 - 1970-01-01 08:00:15.0 )2条元素




public class TumblingWindowsBaseProcessingTime {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);


        /*
            Tumbling Windows 滚动窗口

                1.窗口不重叠
                2.触发窗口计算,并销毁窗口中的元素

         */


        // 从 socket 读取数据源
        DataStreamSource<String> socketTextStream = env.socketTextStream("10.199.241.213", 9099);

        // 将数据源转换成 Tuple2 结构
        SingleOutputStreamOperator<Tuple2<String, Long>> mapStream = socketTextStream.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String val) throws Exception {
                String[] elems = val.split(" ");
                return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
            }
        });

        // 抽取 watermark
        SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = mapStream.assignTimestampsAndWatermarks(
                WatermarkStrategy.forMonotonousTimestamps()
        );

        // 按第一个元素分组
        KeyedStream<Tuple2<String, Long>, String> keyedStream = withWatermarkStream.keyBy(r -> r.f0);

        // 开一个 5 s 的滚动窗口
        WindowedStream<Tuple2<String, Long>, String, TimeWindow> timeWindowWindowedStream = keyedStream.window(TumblingProcessingTimeWindows.of(Time.seconds(5L)));

        // 统计窗口中的元素
        SingleOutputStreamOperator<String> countWinElemsStream = timeWindowWindowedStream.process(
                new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
                    @Override
                    public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {

                        System.out.println("当前 watermark:" + context.currentWatermark() + " :::: 当前时间:" + new Timestamp(System.currentTimeMillis()) );
                        Timestamp winStart = new Timestamp(context.window().getStart());
                        Timestamp winEnd = new Timestamp(context.window().getEnd());

                        long cnt = elements.spliterator().getExactSizeIfKnown();

                        out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + "条元素");

                    }
                }
        );

        countWinElemsStream.print();

        env.execute();


    }

}


[hdfs@hdfs03 ~]$ nc -lk 9099
a 1
a 5
a 3
    > 上面 3 条数据 5秒内输入,然后等待窗口触发
    > 当前 watermark:-9223372036854775808 :::: 当前时间:2023-02-20 18:31:35.017
    > key = a	 window [ 2023-02-20 18:31:30.0 - 2023-02-20 18:31:35.0 )3条元素
a 10
    > 等待 5 s,触发窗口
    > 当前 watermark:-9223372036854775808 :::: 当前时间:2023-02-20 18:31:40.001
    > key = a	 window [ 2023-02-20 18:31:35.0 - 2023-02-20 18:31:40.0 )1条元素
a 2
    > 等待 5 s,触发窗口
    > 当前 watermark:-9223372036854775808 :::: 当前时间:2023-02-20 18:31:45.007
    > key = a	 window [ 2023-02-20 18:31:40.0 - 2023-02-20 18:31:45.0 )1条元素


滚动事件时间窗口分配器 TumblingEventTimeWindows

# org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
    
    protected TumblingEventTimeWindows(long size, long offset, WindowStagger windowStagger) {
        if (Math.abs(offset) >= size) {
            throw new IllegalArgumentException(
                    "TumblingEventTimeWindows parameters must satisfy abs(offset) < size");
        }

        this.size = size;
        this.globalOffset = offset;
        this.windowStagger = windowStagger;
    }


分配窗口
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        if (timestamp > Long.MIN_VALUE) {
            if (staggerOffset == null) {
                staggerOffset =
                        windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);
            }
            // Long.MIN_VALUE is currently assigned when no timestamp is present
            long start =
                    TimeWindow.getWindowStartWithOffset(
                            timestamp, (globalOffset + staggerOffset) % size, size);
            return Collections.singletonList(new TimeWindow(start, start + size));
        } else {
            throw new RuntimeException(
                    "Record has Long.MIN_VALUE timestamp (= no timestamp marker). "
                            + "Is the time characteristic set to 'ProcessingTime', or did you forget to call "
                            + "'DataStream.assignTimestampsAndWatermarks(...)'?");
        }
    }

窗口触发器
    public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return EventTimeTrigger.create();
    }

org.apache.flink.streaming.api.windowing.triggers.EventTimeTrigger
        // 每条数据来都会执行该方法
        public TriggerResult onElement(
            Object element, long timestamp, TimeWindow window, TriggerContext ctx)
            throws Exception {
        if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
            // if the watermark is already past the window fire immediately
            return TriggerResult.FIRE;
        } else {
            ctx.registerEventTimeTimer(window.maxTimestamp());
            return TriggerResult.CONTINUE;
        }
    }
	// 时间时间窗口触发规则
    public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
        return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
    }


TumblingProcessingTimeWindows

# org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows

    创建窗口
    private TumblingProcessingTimeWindows(long size, long offset, WindowStagger windowStagger) {
        if (Math.abs(offset) >= size) {
            throw new IllegalArgumentException(
                    "TumblingProcessingTimeWindows parameters must satisfy abs(offset) < size");
        }

        this.size = size;
        this.globalOffset = offset;
        this.windowStagger = windowStagger;
    }

	// 分配窗口规则
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        final long now = context.getCurrentProcessingTime();
        if (staggerOffset == null) {
            staggerOffset =
                    windowStagger.getStaggerOffset(context.getCurrentProcessingTime(), size);
        }
        long start =
                TimeWindow.getWindowStartWithOffset(
                        now, (globalOffset + staggerOffset) % size, size);
        return Collections.singletonList(new TimeWindow(start, start + size));
    }

	// 窗口触发器
    public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return ProcessingTimeTrigger.create();
    }

org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger	
       
    public TriggerResult onElement(
            Object element, long timestamp, TimeWindow window, TriggerContext ctx) {
        ctx.registerProcessingTimeTimer(window.maxTimestamp());
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) {
        return TriggerResult.FIRE;
    }
    

Sliding Windows(滑动窗口)

滑动窗口分配器将元素分配给固定长度的窗口。与滚动窗口分配器类似,窗口的大小由窗口大小参数配置。一个附加的窗口滑动参数控制滑动窗口启动的频率。因此,如果幻灯片小于窗口大小,则滑动窗口可以重叠。在这种情况下,元素被分配给多个窗口。
例如,您可以让大小为 10 分钟的窗口按 5 分钟滑动。这样,您每 5 分钟就会获得一个窗口,其中包含在过去 10 分钟内到达的事件,如下图所示。
在这里插入图片描述
在这里插入图片描述

public class SlidingWindowsBaseEventTime {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        /*
            Sliding Windows 滚动窗口

                1.窗口重叠
                2.每隔滑动步长秒触发窗口计算,但会保留窗口部分元素
                3.下边界 移动到 上边界 是清空下边界的所有元素

         */


        SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = env
                .socketTextStream("10.199.241.213", 9099)
                .map(new MapFunction<String, Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> map(String val) throws Exception {
                        String[] elems = val.split(" ");
                        return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
                    }
                })
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
                                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                                    @Override
                                    public long extractTimestamp(Tuple2<String, Long> elem, long l) {
                                        return elem.f1;
                                    }
                                })
                );

        SingleOutputStreamOperator<String> countWinProcessStream = withWatermarkStream
                .keyBy(r -> r.f0)
                // 窗口大小为 5 s,窗口滑动步长为 3 s
                .window(SlidingEventTimeWindows.of(Time.seconds(5), Time.seconds(3)))
                .process(new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
                    @Override
                    public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {

                        System.out.println("当前 watermark = " + context.currentWatermark() + " --> " + new Timestamp(context.currentWatermark()) + " ::: current time = " + new Timestamp(System.currentTimeMillis()));

                        Timestamp winStart = new Timestamp(context.window().getStart());
                        Timestamp winEnd = new Timestamp(context.window().getEnd());

                        long cnt = elements.spliterator().getExactSizeIfKnown();

                        out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + " 条元素");


                    }
                });

        countWinProcessStream.print();


        env.execute();


    }

}

public class SlidingWindowsBaseProcessingTime {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        /*
            Sliding Windows 滚动窗口

                1.窗口重叠
                2.每隔滑动步长秒触发窗口计算,但会保留窗口部分元素
                3.下边界 移动到 上边界 是清空下边界的所有元素

         */


        SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = env
                .socketTextStream("10.199.241.213", 9099)
                .map(new MapFunction<String, Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> map(String val) throws Exception {
                        String[] elems = val.split(" ");
                        return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
                    }
                })
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy.forMonotonousTimestamps()
                );

        SingleOutputStreamOperator<String> countWinProcessStream = withWatermarkStream
                .keyBy(r -> r.f0)
                // 窗口大小为 5 s,窗口滑动步长为 3 s
                .window(SlidingProcessingTimeWindows.of(Time.seconds(5), Time.seconds(3)))
                .process(new ProcessWindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
                    @Override
                    public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {

                        System.out.println("当前 watermark = " + context.currentWatermark() + " --> " + new Timestamp(context.currentWatermark()) + " ::: current time = " + new Timestamp(System.currentTimeMillis()));

                        Timestamp winStart = new Timestamp(context.window().getStart());
                        Timestamp winEnd = new Timestamp(context.window().getEnd());

                        long cnt = elements.spliterator().getExactSizeIfKnown();

                        out.collect("key = " + key + "\t window [ " + winStart + " - " + winEnd + " ) 有 " + cnt + " 条元素");


                    }
                });

        countWinProcessStream.print();


        env.execute();


    }

}

SlidingEventTimeWindows

# org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows

    // 创建窗口
    protected SlidingEventTimeWindows(long size, long slide, long offset) {
        if (Math.abs(offset) >= slide || size <= 0) {
            throw new IllegalArgumentException(
                    "SlidingEventTimeWindows parameters must satisfy "
                            + "abs(offset) < slide and size > 0");
        }

        this.size = size;
        this.slide = slide;
        this.offset = offset;
    }

    // 分配窗口
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        if (timestamp > Long.MIN_VALUE) {
            List<TimeWindow> windows = new ArrayList<>((int) (size / slide));
            long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, offset, slide);
            for (long start = lastStart; start > timestamp - size; start -= slide) {
                windows.add(new TimeWindow(start, start + size));
            }
            return windows;
        } else {
            throw new RuntimeException(
                    "Record has Long.MIN_VALUE timestamp (= no timestamp marker). "
                            + "Is the time characteristic set to 'ProcessingTime', or did you forget to call "
                            + "'DataStream.assignTimestampsAndWatermarks(...)'?");
        }
    }

	// 触发器
    public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return EventTimeTrigger.create();
    }

org.apache.flink.streaming.api.windowing.triggers.EventTimeTrigger
        public TriggerResult onElement(
            Object element, long timestamp, TimeWindow window, TriggerContext ctx)
            throws Exception {
        if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
            // if the watermark is already past the window fire immediately
            return TriggerResult.FIRE;
        } else {
            ctx.registerEventTimeTimer(window.maxTimestamp());
            return TriggerResult.CONTINUE;
        }
    }

    public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
        return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
    }

SlidingProcessingTimeWindows

# org.apache.flink.streaming.api.windowing.assigners.SlidingProcessingTimeWindows
        // 初始化
        private SlidingProcessingTimeWindows(long size, long slide, long offset) {
        if (Math.abs(offset) >= slide || size <= 0) {
            throw new IllegalArgumentException(
                    "SlidingProcessingTimeWindows parameters must satisfy "
                            + "abs(offset) < slide and size > 0");
        }

        this.size = size;
        this.slide = slide;
        this.offset = offset;
    }

	// 分配窗口
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        timestamp = context.getCurrentProcessingTime();
        List<TimeWindow> windows = new ArrayList<>((int) (size / slide));
        long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, offset, slide);
        for (long start = lastStart; start > timestamp - size; start -= slide) {
            windows.add(new TimeWindow(start, start + size));
        }
        return windows;
    }

	// 触发器
    public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return ProcessingTimeTrigger.create();
    }

org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger
    
    public TriggerResult onElement(
            Object element, long timestamp, TimeWindow window, TriggerContext ctx) {
        ctx.registerProcessingTimeTimer(window.maxTimestamp());
        return TriggerResult.CONTINUE;
    }
    public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) {
        return TriggerResult.FIRE;
    }

Global Windows

全局窗口的 assigner 将拥有相同 key 的所有数据分发到一个全局窗口。 这样的窗口模式仅在你指定了自定义的 trigger 时有用。 否则,计算不会发生,因为全局窗口没有天然的终点去触发其中积累的数据。

在这里插入图片描述
以滑动计数窗口为例
在这里插入图片描述

public class CountWindowsBaseEventTime {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        /*
            Count Window 计数窗口

                窗口指定条数触发窗口计算,并清除前滑动步长个元素,滚动窗口步长等于窗口大小



         */

        SingleOutputStreamOperator<Tuple2<String, Long>> withWatermarkStream = env
                .socketTextStream("10.199.241.213", 9099)
                .map(new MapFunction<String, Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> map(String val) throws Exception {
                        String[] elems = val.split(" ");
                        return Tuple2.of(elems[0], Long.parseLong(elems[1]) * 1000L);
                    }
                })
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
                                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                                    @Override
                                    public long extractTimestamp(Tuple2<String, Long> elem, long l) {
                                        return elem.f1;
                                    }
                                })
                );

        SingleOutputStreamOperator<String> countWinProcessStream = withWatermarkStream
                .keyBy(r -> r.f0)
                .countWindow(10, 5)
                .process(new ProcessWindowFunction<Tuple2<String, Long>, String, String, GlobalWindow>() {
                    @Override
                    public void process(String key, Context context, Iterable<Tuple2<String, Long>> elements, Collector<String> out) throws Exception {
                        System.out.println("当前 watermark = " + context.currentWatermark() + " --> " + new Timestamp(context.currentWatermark()) + " ::: current time = " + new Timestamp(System.currentTimeMillis()));

                        Timestamp win = new Timestamp(context.currentWatermark());

                        long cnt = elements.spliterator().getExactSizeIfKnown();

                        out.collect("key = " + key + "\t window [ " + win + " ) 有 " + cnt + " 条元素");
                    }
                });

        env.execute();


    }

}

GlobalWindows

# org.apache.flink.streaming.api.windowing.assigners.GlobalWindows

    分配窗口
    public Collection<GlobalWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        return Collections.singletonList(GlobalWindow.get());
    }

	触发器
    public Trigger<Object, GlobalWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return new NeverTrigger();
    }

CountEvictor

# org.apache.flink.streaming.api.windowing.evictors.CountEvictor

    驱逐器
    @Override
    public void evictBefore(
            Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
        if (!doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    @Override
    public void evictAfter(
            Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
        if (doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) {
        if (size <= maxCount) {
            return;
        } else {
            int evictedCount = 0;
            for (Iterator<TimestampedValue<Object>> iterator = elements.iterator();
                    iterator.hasNext(); ) {
                iterator.next();
                evictedCount++;
                if (evictedCount > size - maxCount) {
                    break;
                } else {
                    iterator.remove();
                }
            }
        }
    }

CountTrigger

# org.apache.flink.streaming.api.windowing.triggers.CountTrigger
    触发器
        @Override
    public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx)
            throws Exception {
        ReducingState<Long> count = ctx.getPartitionedState(stateDesc);
        count.add(1L);
        if (count.get() >= maxCount) {
            count.clear();
            return TriggerResult.FIRE;
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onEventTime(long time, W window, TriggerContext ctx) {
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx)
            throws Exception {
        return TriggerResult.CONTINUE;
    }

Session Windows

会话窗口的 assigner 会把数据按活跃的会话分组。 与滚动窗口和滑动窗口不同,会话窗口不会相互重叠,且没有固定的开始或结束时间。 会话窗口在一段时间没有收到数据之后会关闭,即在一段不活跃的间隔之后。 会话窗口的 assigner 可以设置固定的会话间隔(session gap)或 用 session gap extractor 函数来动态地定义多长时间算作不活跃。 当超出了不活跃的时间段,当前的会话就会关闭,并且将接下来的数据分发到新的会话窗口。

在这里插入图片描述

public class SessionWindows {

    public static void main(String[] args) throws Exception {


        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);


        /*

            Session Windows  会话窗口

                1.指定时间内没有数据到来触发窗口计算并清空窗口中元素


                nc -lk 9099
                q
                w
                e
                r
                t
                y
                [waiting at least 5s]
                1
                w
                e
                r
                t
                ...

         */

        env
                .socketTextStream("127.0.0.1", 9099)
                .keyBy(event -> 1)
                // 5s 没有数据到来就触发窗口
                .window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
                .process(
                        // ProcessWindowFunction<IN, OUT, KEY, W extends Window>
                        new ProcessWindowFunction<String, String, Integer, TimeWindow>() {
                            @Override
                            public void process(Integer key, Context context, Iterable<String> elements, Collector<String> out) throws Exception {

                                Timestamp winStart = new Timestamp(context.window().getStart());
                                Timestamp winEnd = new Timestamp(context.window().getEnd());

                                int count = 0;

                                Iterator<String> iterator = elements.iterator();
                                while (iterator.hasNext()) {
                                    iterator.next();
                                    count++;
                                }

                                String result = String.format("window [ %s - %s ] 窗口中有 %d 条元素", winStart, winEnd, count);

                                out.collect(result);

                            }
                        })
                .print();


        env.execute();


    }

}

ProcessingTimeSessionWindows

# org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows

    // 分配窗口
    @Override
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        long currentProcessingTime = context.getCurrentProcessingTime();
        return Collections.singletonList(
                new TimeWindow(currentProcessingTime, currentProcessingTime + sessionTimeout));
    }

	// 触发器
    @Override
    public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return ProcessingTimeTrigger.create();
    }

    // 合并窗口
    public static <T> DynamicProcessingTimeSessionWindows<T> withDynamicGap(
            SessionWindowTimeGapExtractor<T> sessionWindowTimeGapExtractor) {
        return new DynamicProcessingTimeSessionWindows<>(sessionWindowTimeGapExtractor);
    }

DynamicProcessingTimeSessionWindows

# org.apache.flink.streaming.api.windowing.assigners.DynamicProcessingTimeSessionWindows

    // 分配窗口
    public Collection<TimeWindow> assignWindows(
            T element, long timestamp, WindowAssignerContext context) {
        long currentProcessingTime = context.getCurrentProcessingTime();
        long sessionTimeout = sessionWindowTimeGapExtractor.extract(element);
        if (sessionTimeout <= 0) {
            throw new IllegalArgumentException("Dynamic session time gap must satisfy 0 < gap");
        }
        return Collections.singletonList(
                new TimeWindow(currentProcessingTime, currentProcessingTime + sessionTimeout));
    }

	// 触发器
    public Trigger<T, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return (Trigger<T, TimeWindow>) ProcessingTimeTrigger.create();
    }

	// 合并窗口
    public void mergeWindows(Collection<TimeWindow> windows, MergeCallback<TimeWindow> c) {
        TimeWindow.mergeWindows(windows, c);
    }

org.apache.flink.streaming.api.windowing.windows.TimeWindow#mergeWindows

    public static void mergeWindows(
            Collection<TimeWindow> windows, MergingWindowAssigner.MergeCallback<TimeWindow> c) {

        // sort the windows by the start time and then merge overlapping windows

        List<TimeWindow> sortedWindows = new ArrayList<>(windows);

        Collections.sort(
                sortedWindows,
                new Comparator<TimeWindow>() {
                    @Override
                    public int compare(TimeWindow o1, TimeWindow o2) {
                        return Long.compare(o1.getStart(), o2.getStart());
                    }
                });

        List<Tuple2<TimeWindow, Set<TimeWindow>>> merged = new ArrayList<>();
        Tuple2<TimeWindow, Set<TimeWindow>> currentMerge = null;

        for (TimeWindow candidate : sortedWindows) {
            if (currentMerge == null) {
                currentMerge = new Tuple2<>();
                currentMerge.f0 = candidate;
                currentMerge.f1 = new HashSet<>();
                currentMerge.f1.add(candidate);
            } else if (currentMerge.f0.intersects(candidate)) {
                currentMerge.f0 = currentMerge.f0.cover(candidate);
                currentMerge.f1.add(candidate);
            } else {
                merged.add(currentMerge);
                currentMerge = new Tuple2<>();
                currentMerge.f0 = candidate;
                currentMerge.f1 = new HashSet<>();
                currentMerge.f1.add(candidate);
            }
        }

        if (currentMerge != null) {
            merged.add(currentMerge);
        }

        for (Tuple2<TimeWindow, Set<TimeWindow>> m : merged) {
            if (m.f1.size() > 1) {
                c.merge(m.f1, m.f0);
            }
        }
    }

参考资料
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/windows/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值