flink---6 dataSet api （2）transformation和parallel和sink

最新推荐文章于 2023-02-20 12:02:56 发布

置顶代码届彭于晏

最新推荐文章于 2023-02-20 12:02:56 发布

阅读量171

点赞数

分类专栏：框架大数据

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/m0_37139189/article/details/91501664

版权

框架同时被 2 个专栏收录

74 篇文章

订阅专栏

39 篇文章

订阅专栏

transfrom

ransformation	Description
Map	`data.map(new MapFunction<String, Integer>() { public Integer map(String value) { return Integer.parseInt(value); } });`
FlatMap	参考stream `data.flatMap(new FlatMapFunction<String, String>() { public void flatMap(String value, Collector<String> out) { for (String s : value.split(" ")) { out.collect(s); } } });`
MapPartition	如果map中数据源需要对接第三方数据源，建议使用这个 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<String> list = Lists.newArrayList("hello you","hello me"); DataSource<String> source = env.fromCollection(list); final DataSet<String> mapPartitionOperator = source.mapPartition(new MapPartitionFunction<String, String>() { @Override public void mapPartition(Iterable<String> iterable, Collector<String> collector) throws Exception { //连接数据库 //关闭链接 Iterator<String> it = iterable.iterator(); while (it.hasNext()) { String next = it.next(); String[] split = next.split("\\W+"); for (String word : split) { collector.collect(word); } } } }); mapPartitionOperator.print(); }
Filter	Evaluates a boolean function for each element and retains those for which the function returns true. IMPORTANT: The system assumes that the function does not modify the elements on which the predicate is applied. Violating this assumption can lead to incorrect results. `data.filter(new FilterFunction<Integer>() { public boolean filter(Integer value) { return value > 1000; } });`
Aggregate	Aggregates a group of values into a single value. Aggregation functions can be thought of as built-in reduce functions. Aggregate may be applied on a full data set, or on a grouped data set. `Dataset<Tuple3<Integer, String, Double>> input = // [...] DataSet<Tuple3<Integer, String, Double>> output = input.aggregate(SUM, 0).and(MIN, 2);` You can also use short-hand syntax for minimum, maximum, and sum aggregations. `Dataset<Tuple3<Integer, String, Double>> input = // [...] DataSet<Tuple3<Integer, String, Double>> output = input.sum(0).andMin(2);`
Distinct	去重 `data.distinct();`
Join	ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer,String>> list = Lists.newArrayList(new Tuple2<>(1,"beijing"),new Tuple2<>(2,"shanghai"),new Tuple2<>(3,"guangzhou")); List<Tuple2<Integer,String>> list2 = Lists.newArrayList(new Tuple2<>(1,"zs"),new Tuple2<>(2,"ls"),new Tuple2<>(3,"ww")); DataSource<Tuple2<Integer,String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer,String>> text2 = env.fromCollection(list2); final DataSet<Tuple3<Integer, String, String>> with = text1.join(text2).where(0)//根据第一个元素关联 .equalTo(0)//指定第二个数据集中需要进行比较的元素角标 .with( new JoinFunction<Tuple2<Integer, String>, Tuple2<Integer, String>, Tuple3<Integer, String, String>>() { @Override public Tuple3<Integer, String, String> join(Tuple2<Integer, String> integerStringTuple2, Tuple2<Integer, String> integerStringTuple22) throws Exception { return new Tuple3<>(integerStringTuple2.f0, integerStringTuple2.f1, integerStringTuple22.f1); } }); with.print();
OuterJoin	实际上类似于mysql中的外链接 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer,String>> list = Lists.newArrayList(new Tuple2<>(1,"beijing"),new Tuple2<>(2,"shanghai"),new Tuple2<>(3,"guangzhou")); List<Tuple2<Integer,String>> list2 = Lists.newArrayList(new Tuple2<>(1,"zs"),new Tuple2<>(2,"ls"),new Tuple2<>(4,"ww")); DataSource<Tuple2<Integer,String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer,String>> text2 = env.fromCollection(list2); final DataSet<Tuple3<Integer, String, String>> with = text1.fullOuterJoin(text2).where(0)//根据第一个元素关联 .equalTo(0)//指定第二个数据集中需要进行比较的元素角标 .with( new JoinFunction<Tuple2<Integer, String>, Tuple2<Integer, String>, Tuple3<Integer, String, String>>() { @Override public Tuple3<Integer, String, String> join(Tuple2<Integer, String> first, Tuple2<Integer, String> second) throws Exception { if (second==null){ return new Tuple3<>(first.f0,first.f1,"null"); }else if (first ==null){ return new Tuple3<>(second.f0,"null",second.f1); } else{ return new Tuple3<>(first.f0,first.f1,second.f1); } } }); with.print();
Cross	创建笛卡尔积 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer, String>> list = Lists.newArrayList(new Tuple2<>(1, "beijing"), new Tuple2<>(2, "shanghai"), new Tuple2<>(3, "guangzhou")); List<Tuple2<Integer, String>> list2 = Lists.newArrayList(new Tuple2<>(1, "zs"), new Tuple2<>(2, "ls"), new Tuple2<>(4, "ww")); DataSource<Tuple2<Integer, String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer, String>> text2 = env.fromCollection(list2); final CrossOperator.DefaultCross<Tuple2<Integer, String>, Tuple2<Integer, String>> cross = text1.cross(text2); cross.print();
Union	ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer, String>> list = Lists.newArrayList(new Tuple2<>(1, "beijing"), new Tuple2<>(2, "shanghai"), new Tuple2<>(3, "guangzhou")); List<Tuple2<Integer, String>> list2 = Lists.newArrayList(new Tuple2<>(1, "zs"), new Tuple2<>(2, "ls"), new Tuple2<>(4, "ww")); DataSource<Tuple2<Integer, String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer, String>> text2 = env.fromCollection(list2); final UnionOperator<Tuple2<Integer, String>> union = text1.union(text2); union.print(); }
Sort Partition	在本地对所有数据集进行排序 `text1.sortPartition(0,Order.ASCENDING).sortPartition(1,Order.DESCENDING).print(); //先按第一列进行升序，再按第二列做降序`
First-n	获取前面几个数据 `DataSet<Tuple2<String,Integer>> in = // [...] // regular data set DataSet<Tuple2<String,Integer>> result1 = in.first(3);返回一个集合中的三个数据 // grouped data set DataSet<Tuple2<String,Integer>> result2 = in.groupBy(0)返回每个分组的前三个数据 .first(3); // grouped-sorted data set DataSet<Tuple2<String,Integer>> result3 = in.groupBy(0) .sortGroup(1, Order.ASCENDING) .first(3);//根据第二列进行组内排序`

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。