flink groupby keyby区别

1.groupby与keyby区别

spark中我们经常使用groupby算子对数据进行聚合。flink中,不仅有groupby算法,还有keyby算子,那么这两者的区别在哪里?
直接说结论:
groupby是用在DataSet系列API中,Table/SQL等操作也是使用groupby。
keyby是用在DataStream系列API中。

2.groupby简单实例

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;

public class GroupBy {
    public static void groupbycode() throws Exception {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        DataSet<String> text = env.fromElements("java python java python python c");
        DataSet<Tuple2<String, Integer>> dataSet = text.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                for(String word: value.split(" ")) {
                    out.collect(new Tuple2<>(word, 1));
                }
            }
        });
        dataSet = dataSet.groupBy(0)
                .sum(1);
        dataSet.print();
    }

    public static void main(String[] args) throws Exception {
        groupbycode();
    }
}

上面可以认为是batch版的wordcount操作,对于DataSet使用的就是groupBy操作。

3.keyby简单实例

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

public class StreamWordCount {

    public static final class Splitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
            for(String word: s.split(" ")) {
                collector.collect(new Tuple2<String, Integer>(word, 1));
            }
        }
    }

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<Tuple2<String, Integer>> dataStream = env
                .socketTextStream("localhost", 9999)
                .flatMap(new Splitter())
                .keyBy(value -> value.f0)
                .window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
                .sum(1);

        dataStream.print();
        env.execute("Window WordCount");
    }
}

上面是stream版的wordcount操作,对于DataStream数据,使用的则是keyby算子。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值