Flink SQL窗口(TUMBLE、HOP、SESSION、CUMULATE、OVER)

前言

Flink SQL窗口需要有时间,时间是为窗口服务的。

Flink建表语句

CREATE TABLE source (
    id VARCHAR,
    price BIGINT,
    --获取系统时间
    event_time AS CAST(CURRENT_TIMESTAMP AS TIMESTAMP(3)),
    --设置watermark
    watermark FOR event_time AS event_time - INTERVAL '0' SECOND
) WITH (
    --with 关键字,定义连接器相关信息
    'connector' = 'datagen',
    'rows-per-second' = '5',
    'fields.id.length' = '2',  
    'fields.price.min' = '1',
    'fields.price.max' = '100'
);
    • 语法有两种
    • Group Window Aggregation
    • Windowing TVF
    • 如果你的业务需求比较简单,只需要进行基本的窗口聚合操作,且希望使用传统的 SQL 语法,那么 Group Window Aggregation 是更合适的选择。
    • 如果你的业务需求比较复杂,需要在窗口操作之后进行更复杂的查询和处理,或者需要同时处理多个窗口,那么 Windowing TVF 会更加灵活和强大。

    1.滚动窗口(TUMBLE)

    滚动窗口:将每条数据指定给具体的窗口。滚动窗口窗口大小固定,且数据没有重复计算和丢失。例如,指定一个大小为 5 秒钟的滚动窗口。在这种情况下,Flink 将每隔 5 秒钟开启一个新的窗口,其中每一条数都会划分到唯一的一个 5 秒钟的窗口中,如下图所示。

    语法1:tumble(事件时间列,窗口大小) Group Window Aggregation

    select 
    	id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	UNIX_TIMESTAMP(CAST(tumble_start(event_time, interval '5' second ) as STRING))*1000 as window_start,
    	UNIX_TIMESTAMP(CAST(tumble_end(event_time, interval '5' second ) as STRING))*1000 as window_end
    from source
    group by
    	id,
    	tumble(event_time, interval '5' second );

    语法2:table(TUMBLE(table 表名,DESCRIPTOR(事件时间列),窗口大小) Windowing TVF

    select 
    	id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	UNIX_TIMESTAMP(CAST (window_start as STRING))*1000 as window_start,
    	UNIX_TIMESTAMP(CAST (window_end as STRING))*1000 as window_end
    from TABLE
    (
    	TUMBLE(
    	TABLE source,
    	DESCRIPTOR(event_time),
    	INTERVAL '5' SECOND 
    	)
    )
    group by
    	id,
    	window_start,
    	window_end;
    • ⭐ 应用场景:例如,电商平台需要实时统计每 15 分钟内的订单数量、订单总金额等。

    2滑动窗口(HOP)

    滑动窗口:也是将数据指定给固定长度的窗口。与滚动窗口功能一样,也有窗口大小的概念。区别在于,滑动窗口有另一个参数控制窗口计算的频率(滑动窗口滑动的步长)。因此,如果滑动的步长小于窗口大小,则滑动窗口之间每个窗口是可以重叠。在这种情况下,一条数据就会分配到多个窗口当中。举例,有 5秒钟大小的窗口,滑动步长为 2 秒钟。这样,每 2秒钟会产生一个窗口,这个窗口包含的数据是过去 5秒钟的数据,如下图所示。

    语法1:hop(事件时间列,窗口滑动步长,窗口大小)

    select 
    	id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	UNIX_TIMESTAMP(CAST(hop_start(event_time, interval '2' second,interval '5' second ) as STRING))*1000 as window_start,
    	UNIX_TIMESTAMP(CAST(hop_end(event_time, interval '2' second,interval '5' second ) as STRING))*1000 as window_end
    from source
    group by
    	id,
    	hop(event_time, interval '2' second,interval '5' second );

     语法2:table(hop(table 表名,DESCRIPTOR(事件时间列),窗口滑动步长,窗口大小)

    select 
    	id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	UNIX_TIMESTAMP(CAST(window_start as STRING))*1000 as window_start,
    	UNIX_TIMESTAMP(CAST(window_end as STRING))*1000 as window_end
    from table(hop(table source,DESCRIPTOR(event_time),interval '2' second,interval '5' second))
    group by
    	id,
    	window_start,
    	window_end;
    • ⭐ 应用场景:例如,设置一个 1 小时的滑动窗口,每 10 分钟滑动一次,分析商品在不同时间段的销售情况,帮助商家及时调整营销策略,如调整商品价格、进行促销活动等。

     3.会话窗口(SESSION)

    Session窗口 :和滚动、滑动窗口不一样,其没有固定的持续时间,窗口大小不固定,如果在定义的间隔期(Session Gap)内没有新的数据出现,则产生新窗口。

    语法:session(事件时间列,窗口时间 ) 暂不支持Window TVF

    select 
    	id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	UNIX_TIMESTAMP(CAST(session_start(event_time, interval '2' second,interval '5' second ) as STRING))*1000 as window_start,
    	UNIX_TIMESTAMP(CAST(session_end(event_time, interval '2' second,interval '5' second ) as STRING))*1000 as window_end
    from source
    group by
    	id,
    	session(event_time,interval '5' second );

     ⭐ 应用场景:例如,设定会话间隙为 1 小时,当用户在看到广告后的 1 小时内进行了点击和购买操作,这些行为会被归为同一个会话。

    4.渐进式窗口(CUMULATE)

    渐进式窗口:属于累积窗口。与滚动窗口(TUMBLE)、滑动窗口(HOP)不同,累积窗口会在窗口不断向前滑动的过程中,持续累积之前窗口的数据。也就是说,累积窗口会随着时间的推移,将新到来的数据添加到之前已经处理过的数据集合中进行聚合计算,其窗口的结束时间会不断向后推移,而起始时间保持不变,直到达到最大窗口大小。

     语法1: CUMULATE(事件时间列,时间间隔, 窗口大小);

    select 
        id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	window_start,
    	window_end
    from source
    group by
        id,
        CUMULATE(event_time, INTERVAL '1' HOUR, INTERVAL '24' HOUR);
    	window_start,
    	window_end;	

    语法2:table (cumulate(table 表名,DESCRIPTOR(事件时间列),窗口滑动步长,窗口最大大小 )) 

    select 
        id,
    	count(*) as counts,
    	sum(price) as sum_price,
    	window_start,
    	window_end
    from table (cumulate(table source,DESCRIPTOR(event_time),interval '5' second ,interval '30' second ))
    group by
        id,
    	window_start,
    	window_end;	

      ⭐ 应用场景:在电商或零售行业,需要实时统计某商品最近24小时的累计销售额、销售量等信息。通过 CUMULATE 窗口,可以不断累积每小时销售数据,从而得到从销售开始到当前时刻的累计销售情况。

    5.聚合窗口(OVER)

    OVER 窗口:是一种用于在结果集中对数据进行分组和排序,并在每个分组内进行计算的窗口类型。与常规的时间窗口(如滚动窗口、滑动窗口)不同,OVER 窗口不基于时间边界来划分数据,而是基于排序后的行进行计算,通常用于计算每行数据相对于其前后行的聚合值,例如计算移动平均值、累计和等。

    语法:sum/max/min/count/avg(聚合字段) /row_number()/rank()   (PARTITION by 分组字段 order by 排序字段 range/rows BETWEEN and )

    SELECT 
        id,
        price,
        event_time,
        SUM(price) OVER (PARTITION BY id ORDER BY event_time RANGE BETWEEN INTERVAL '3' SECOND PRECEDING AND CURRENT ROW) AS range_1,
        COUNT(price) OVER (PARTITION BY id ORDER BY event_time RANGE BETWEEN INTERVAL '3' SECOND PRECEDING AND CURRENT ROW) AS range_2
    FROM source;
    
    SELECT 
        id,
        price,
        event_time,
        SUM(price) OVER (PARTITION BY id ORDER BY event_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS rows_1,
        COUNT(price) OVER (PARTITION BY id ORDER BY event_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS rows_2
    FROM source;

       ⭐ 应用场景:移动平均值计算,累计和计算,排名计算

      评论
      添加红包

      请填写红包祝福语或标题

      红包个数最小为10个

      红包金额最低5元

      当前余额3.43前往充值 >
      需支付:10.00
      成就一亿技术人!
      领取后你会自动成为博主和红包主的粉丝 规则
      hope_wisdom
      发出的红包
      实付
      使用余额支付
      点击重新获取
      扫码支付
      钱包余额 0

      抵扣说明:

      1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
      2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

      余额充值