DataStream Transformations
Map
DataStream -> DataStream
算子: Map
输入流 输出流
DataStream -> DataStream
栗子:
采用一个元素并生成一个元素。一个map函数,它将输入流的值加倍。
DataStream<Integer> dataStream = //...
dataStream.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return 2 * value;
}
});
FlatMap
DataStream -> DataStream
算子:FlatMap
DataStream -> DataStream
采用一个元素并生成零个,一个或多个元素。将句子分割为单词的flatmap函数:
dataStream.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out)
throws Exception {
for(String word: value.split(" ")){
out.collect(word);
}
}
});
Filter
DataStream -> DataStream
Filter
输入流 输出流
DataStream -> DataStream
计算每个元素的布尔函数,并保留函数返回true的元素。过滤掉零值的过滤器
dataStream.filter(new FilterFunction<Integer>() {
@Override
public boolean filter(Integer value) throws Exception {
return value != 0;
}
});
KeyBy
DataStream -> keyedStream
KeyBy
// 字段名称或者是Tuple的第几个元素作为分组
// 不可以用基本类型
根据指定的key进行分组,相同key的数据会进入同一个分区
dataStream.keyBy("someKey") // Key by field "someKey"
dataStream.keyBy(0) // Key by the first element of a Tuple
注意:以下类型是无法作为key的
1、一个实体类对象,没有重写hashCode方法,并且依赖Object的hashCode方法
2、一个任意形式的数组类型
Reduce
keyedStream -> DataStream
对数据进行聚合
算子:Reduce
keyedStream -> DataStream
keyedStream.reduce(new ReduceFunction<Integer>() {
@Override
public Integer reduce(Integer value1, Integer value2)
throws Exception {
return value1 + value2;
}
});
Aggregations
KeyedStream → DataStream
keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");
union
DataStream* → DataStream
两个或多个数据流的联合,创建包含来自所有流的所有元素的新流。注意:如果将数据流与其自身联合,则会在结果流中获取两次元素
dataStream.union(otherStream1, otherStream2, ...);
//获取Flink的运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 获取数据源
DataStreamSource<Long> text1 = env.addSource(new MyNoParalleSource()).setParallelism(1);
DataStreamSource<Long> text2 = env.addSource(new MyNoParalleSource()).setParallelism(1);
// 把text1和text2组装到一起
DataStream<Long> text = text1.union(text2);
Connect 和union类似
但是只能连接两个流,两个流的类型可以不同,会对两个流中的数据应用不同的处理方法。
CoMap, CoFlatMap:在ConnectedStreams中需要使用这种函数,类似于map 和 flatmap;
// 获取数据源
DataStreamSource<Long> text1 = env.addSource(new MyNoParalleSource()).setParallelism(1);
DataStreamSource<Long> text2 = env.addSource(new MyNoParalleSource()).setParallelism(1);
SingleOutputStreamOperator<String> text2_str = text2.map(new MapFunction<Long, String>() {
@Override
public String map(Long value) throws Exception {
return "str_" + value;
}
});
ConnectedStreams<Long,String> connectStream = text1.connect(text2_str);
SingleOutputStreamOperator<Object> result = connectStream.map(new CoMapFunction<Long, String, Object>() {
@Override
public Object map1(Long value) throws Exception {
return value;
}
@Override
public Object map2(String value) throws Exception {
return value;
}
});
//打印结果
result.print().setParallelism(1);
Split:根据规则把一个数据流且分为多个流
Select:和split 配合使用,选择切分后的流
// 对流进行切分,按照数据的奇偶性进行区分
SplitStream<Long> splitStream = text.split(new OutputSelector<Long>() {
@Override
public Iterable<String> select(Long value) {
ArrayList<String> outPut = new ArrayList<>();
if(value %2 ==0){
outPut.add("even"); //偶数
}else{
outPut.add("odd"); //奇数
}
return outPut;
}
});
// 选择一个或者多个切分后流
DataStream<Long> evenStream = splitStream.select("even");
DataStream<Long> oddStream = splitStream.select("odd");
DataStream<Long> mixStream = splitStream.select("odd","even");
//打印结果
mixStream.print().setParallelism(1);