概述
streaming流式计算是一种被设计用于处理无限数据集的数据处理引擎,而无限数据集是指一种不断增长的本质上无限的数据集,而window是一种切割无限数据为有限块进行处理的手段。
Window是无限数据流处理的核心,Window将一个无限的stream拆分成有限大小的”buckets”桶,我们可以在这些桶上做计算操作。
flink中的时间种类有哪些
Flink中的时间与现实世界中的时间是不一致的,在flink中被划分为事件时间,摄入时间,处理时间三种。
1、如果以EventTime为基准来定义时间窗口将形成EventTimeWindow,要求消息本身就应该携带EventTime
2、如果以IngesingtTime为基准来定义时间窗口将形成IngestingTimeWindow,以source的systemTime为准。
3、如果以ProcessingTime基准来定义时间窗口将形成ProcessingTimeWindow,以operator的systemTime为准。
window类型
时间窗口:按照时间生成window
滚动时间窗口
滑动时间窗口
会话窗口
计数窗口:按照指定的数据条数生成一个window,与时间无关
滚动计数窗口
滑动计数窗口
1、滚动窗口(tumbling windows)
依据固定的窗口长度对数据进行切分
时间对齐,窗口长度固定,没有重叠

2、滑动窗口(sliding windows)
可以按照固定的长度向后滑动固定的距离
滑动窗口由固定的窗口长度和滑动间隔组成
可以有重叠(是否重叠和滑动距离有关系)
滚动窗口可以看做是特殊的滑动窗口(窗口大小和滑动距离相同)

3、会话窗口(session windows)
由一系列事件组合一个指定时间长度的timeout间隙组成,也就是一段时间没有接收到新数据就会生成新的窗口

window api
1、timeWindow、countWindow、window
public class Window1 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties prop = new Properties();
prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_2");
prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor",
new SimpleStringSchema(), prop));
SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
@Override
public SensorReading map(String s) throws Exception {
String[] split = s.split(",");
return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
}
});
SingleOutputStreamOperator<SensorReading> maxStream = mapStream.keyBy("id")
//滚动时间窗口 将数据流切分成不重叠的窗口
//每一个事件只能属于一个窗口
.timeWindow(Time.seconds(15))
//滑动时间窗口 窗口是不间断的 需要平滑地进行窗口聚合
//每5秒计算一次最近15秒的最高温度
// .timeWindow(Time.seconds(15),Time.seconds(5))
//每15个行为事件统计最高温度
// .countWindow(15)
//每2个计算一次最近6个事件的最高温度
// .countWindow(6,2)
//会话窗口 会话窗口在一段时间内没有接收到元素时,
//即当发生不活动的间隙时,会话窗口关闭 静态
// .window(EventTimeSessionWindows.withGap(Time.seconds(15)))
.max("temperature");
mapStream.print("max");
env.execute("max_1");
}
}
2、reduce
public class WindowReduce {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties prop = new Properties();
prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_3");
prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
".serialization.StringDeserializer");
prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
".serialization.StringDeserializer");
prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor",
new SimpleStringSchema(), prop));
SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
@Override
public SensorReading map(String s) throws Exception {
String[] split = s.split(",");
return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
}
});
SingleOutputStreamOperator<SensorReading> resultStream = mapStream.keyBy("id")
.countWindow(6, 2)
//温度求和
.reduce(new ReduceFunction<SensorReading>() {
@Override
public SensorReading reduce(SensorReading sensorReading, SensorReading t1) throws Exception {
return new SensorReading(sensorReading.getId(),
sensorReading.getTimestamp(),
sensorReading.getTemperature() + t1.getTemperature());
}
});
resultStream.print("sum");
env.execute("reduce_1");
}
}


3、aggregate
public class WindowAgg {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties prop = new Properties();
prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_3");
prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
".serialization.StringDeserializer");
prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
".serialization.StringDeserializer");
prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor",
new SimpleStringSchema(), prop));
SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
@Override
public SensorReading map(String s) throws Exception {
String[] split = s.split(",");
return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
}
});
SingleOutputStreamOperator<Double> aggStream = mapStream.keyBy("id")
.countWindow(6, 2)
.aggregate(new AggregateFunction<SensorReading, Tuple2<Double, Integer>, Double>() {
@Override
public Tuple2<Double, Integer> createAccumulator() {
//初始值
return new Tuple2<>(0.0, 0);
}
@Override
public Tuple2<Double, Integer> add(SensorReading sensorReading, Tuple2<Double, Integer> doubleIntegerTuple2) {
//求和 温度值相加 个数+1
double temp = sensorReading.getTemperature() + doubleIntegerTuple2._1;
int count = doubleIntegerTuple2._2 + 1;
return new Tuple2<>(temp, count);
}
@Override
public Double getResult(Tuple2<Double, Integer> doubleIntegerTuple2) {
//求平均值
return doubleIntegerTuple2._1 / doubleIntegerTuple2._2;
}
@Override
public Tuple2<Double, Integer> merge(Tuple2<Double, Integer> doubleIntegerTuple2, Tuple2<Double, Integer> acc1) {
double temp = doubleIntegerTuple2._1 + acc1._1;
int i = doubleIntegerTuple2._2 + acc1._2;
return new Tuple2<>(temp, i);
}
});
aggStream.print("avg");
env.execute("agg_1");
}
}


4、allowedLateness
public class Window3 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
//设置时间语义 事件发生时间 TimeCharacteristic.EventTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties prop = new Properties();
prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensor_group2");
prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>
("sensor", new SimpleStringSchema(), prop));
SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
@Override
public SensorReading map(String s) throws Exception {
String[] split = s.split(",");
return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
}//BoundedOutOfOrdernessTimestampExtractor 处理乱序时间 Time.seconds(0) 不延迟
}).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(0)) {
@Override
public long extractTimestamp(SensorReading sensorReading) {
return sensorReading.getTimestamp() * 1000L;
}
});
SingleOutputStreamOperator<SensorReading> maxResultStream
= mapStream.keyBy("id")
.timeWindow(Time.seconds(15))
.allowedLateness(Time.seconds(30))
.max("temperature");
maxResultStream.print("max");
env.execute("window3");
}
}



5、sideOutputLateData
public class Window3 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
//设置时间语义 事件发生时间 TimeCharacteristic.EventTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties prop = new Properties();
prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensor_group2");
prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>
("sensor", new SimpleStringSchema(), prop));
SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
@Override
public SensorReading map(String s) throws Exception {
String[] split = s.split(",");
return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
}//BoundedOutOfOrdernessTimestampExtractor 处理乱序时间 Time.seconds(0) 不延迟
}).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(0)) {
@Override
public long extractTimestamp(SensorReading sensorReading) {
return sensorReading.getTimestamp() * 1000L;
}
});
OutputTag<SensorReading> outputTag = new OutputTag<SensorReading>("late11111"){};
SingleOutputStreamOperator<SensorReading> maxResultStream
= mapStream.keyBy("id")
.timeWindow(Time.seconds(15))
.allowedLateness(Time.seconds(30))
.sideOutputLateData(outputTag)
.max("temperature");
maxResultStream.print("max");
DataStream<SensorReading> sideOutput = maxResultStream.getSideOutput(outputTag);
sideOutput.print("sideout");
env.execute("window");
}
}


本文介绍了Flink流处理引擎中的时间概念与窗口处理机制,包括事件时间、摄入时间和处理时间的区别,以及滚动窗口、滑动窗口和会话窗口的具体应用。
612

被折叠的 条评论
为什么被折叠?



