Flink Transform
以下是flink中常见的一些transform
map
// 创建执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 加载数据源
DataStream<String> datas = env.readTextFile("data/words.txt");
// map transform 转换
DataStream<Integer> map = datas.map(new MapFunction<String, Integer>() {
@Override
public Integer map(String value) throws Exception {
return value.length();
}
});
flatmap
// 创建执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 加载数据源
DataStream<String> datas = env.readTextFile("data/words.txt");
// flatmap transform 转换
DataStream<String> flatmap = datas.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out) throws Exception {
String[] split = value.split(" ");
for (String res : split) {
out.collect(res);
}
}
});
filter
// 创建执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 加载数据源
DataStream<String> datas = env.readTextFile("data/words.txt");
// filter transform 转换
DataStream<String> filter = datas.filter(new FilterFunction<String>() {
@Override
public boolean filter(String value) throws Exception {
value.startsWith("1");
return false;
}
});
keyby (keyby Transform使用详情参考此篇博客)
https://blog.youkuaiyun.com/qq_29342297/article/details/112978727
滚动聚合算子(从DataStream转换为KeyedStream后,可以进行聚合操作)
* sum * min * max * minBy * maxBy
Reduce
KeyedStream 类型才能进行Reduce操作
// 时间戳最新,温度最大
// reduce 规约,
DataStream<TempInfo> reduce = tempkeyedStream.reduce(new ReduceFunction<TempInfo>() {
@Override
public TempInfo reduce(TempInfo value1, TempInfo value2) throws Exception {
return new TempInfo(value1.getId(), value2.getTimeStamp(), Math.max(value1.getTemppera(), value2.getTemppera()));
}
});
tempkeyedStream.reduce((currData,newData) -> new TempInfo(currData.getId(), newData.getTimeStamp(), Math.max(currData.getTemppera(), newData.getTemppera())));
Split(过期) 和 Select :流拆分,仅限于表面上拆分。(1.10版本是过期状态, 1.12版本没找到。)
Connect 和 CoMap:将两个流进行合并 , 演示的没啥意义。
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> filedata = env.readTextFile("data/temps.txt");
DataStream<TempInfo> mapDataStream = filedata.map(new MapFunction<String, TempInfo>() {
@Override
public TempInfo map(String value) throws Exception {
return new TempInfo();
}
});
KeyedStream<TempInfo, String> keyedStream1 = mapDataStream.keyBy(data -> data.getId());
KeyedStream<TempInfo, String> keyedStream2 = mapDataStream.keyBy(data -> data.getId());
ConnectedStreams<TempInfo, TempInfo> connect = keyedStream1.connect(keyedStream2);
Union:将多个流进行合并
keyedStream1.union(keyedStream2);