1 winsow的概念
flink是流失处理框架,在真实应用中流一般是没有边界的.那要处理无界的流我们一般怎么处理呢?一般是把无界流切分成一份份有界的流,窗口就是切分无界流的一种方式.它会将流数据分发到有限大小的桶(bucket)中进行分析.
2 window的类型
2.1 时间窗口(Time Window)
2.1.1 滚动时间窗口
(1) 将数据依照固定的窗口大小进行切分,每个窗口首尾相连.
(2) 时间对齐,窗口长度固定,没有重叠
2.1.2 求最近10秒的最小温度
package test3
import test2.{
SensorReading, SensorSource}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.function.ProcessWindowFunction
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import org.apache.flink.util.Collector
object MinMaxTempPerWindow {
case class MinMaxTemp(id: String,
min: Double,
max: Double,
endTs: Long)
/**
* 求5秒钟内的最大值和最小值
* @param args
*/
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val stream = env
.addSource(new SensorSource)
stream
.keyBy(_.id)
.timeWindow(Time.seconds(5))
.process(new HighAndLowTempPerWindow)
.print()
env.execute()
}
class HighAndLowTempPerWindow extends ProcessWindowFunction[SensorReading, MinMaxTemp, String, TimeWindow] {
override def process(key: String, context: Context, elements: Iterable[SensorReading], out: Collector[MinMaxTemp]): Unit = {
val temps = elements.map(_.temperature)
val windowEnd = context.window.getEnd
out.collect(MinMaxTemp(key, temps.min, temps.max, windowEnd))
}
}
}
2.1.3 滑动时间窗口
(1) 滑动窗口是固定窗口的更广义的一种形式,滑动窗口由固定的窗口长度和滑动间隔组成
(2) 窗口长度固定,可以有重叠
package org.example.windowfunc
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
import org.example.source.self.SensorSource
/**
* 没5秒钟求最近10秒钟的温度最小值
*/
object MinTempPerWindow {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val stream = env.addSource(new SensorSource)
stream.map(r => (r.id, r.temperature))
.keyBy(_._1)
.timeWindow(Time.seconds(10), Time.seconds(5))
.reduce((r1, r2) => (r1._1, r1._2.min(r2._2)))
.print()
env.execute()
}
}
2.1.4 会话窗口
(1) 由一系列事件组合一个指定时间长度的 timeout 间隙组成,也就是一段时间没有接收到新数据就会生成新的窗口
(2) 特点:时间无对齐
package org.example.windowfunc
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.example.source.self.SensorSource