transfrom
ransformation | Description |
---|---|
Map
|
|
FlatMap | 参考stream |
MapPartition | 如果map中数据源需要对接第三方数据源,建议使用这个
|
Filter | Evaluates a boolean function for each element and retains those for which the function returns true. |
Aggregate | Aggregates a group of values into a single value. Aggregation functions can be thought of as built-in reduce functions. Aggregate may be applied on a full data set, or on a grouped data set. You can also use short-hand syntax for minimum, maximum, and sum aggregations. |
Distinct | 去重
|
Join | ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); |
OuterJoin | 实际上类似于mysql中的外链接 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer,String>> list = Lists.newArrayList(new Tuple2<>(1,"beijing"),new Tuple2<>(2,"shanghai"),new Tuple2<>(3,"guangzhou")); List<Tuple2<Integer,String>> list2 = Lists.newArrayList(new Tuple2<>(1,"zs"),new Tuple2<>(2,"ls"),new Tuple2<>(4,"ww")); DataSource<Tuple2<Integer,String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer,String>> text2 = env.fromCollection(list2); final DataSet<Tuple3<Integer, String, String>> with = text1.fullOuterJoin(text2).where(0)//根据第一个元素关联 .equalTo(0)//指定第二个数据集中需要进行比较的元素角标 .with( new JoinFunction<Tuple2<Integer, String>, Tuple2<Integer, String>, Tuple3<Integer, String, String>>() { @Override public Tuple3<Integer, String, String> join(Tuple2<Integer, String> first, Tuple2<Integer, String> second) throws Exception { if (second==null){ return new Tuple3<>(first.f0,first.f1,"null"); }else if (first ==null){ return new Tuple3<>(second.f0,"null",second.f1); } else{ return new Tuple3<>(first.f0,first.f1,second.f1); } } }); with.print(); |
Cross | 创建笛卡尔积 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer, String>> list = Lists.newArrayList(new Tuple2<>(1, "beijing"), new Tuple2<>(2, "shanghai"), new Tuple2<>(3, "guangzhou")); List<Tuple2<Integer, String>> list2 = Lists.newArrayList(new Tuple2<>(1, "zs"), new Tuple2<>(2, "ls"), new Tuple2<>(4, "ww")); DataSource<Tuple2<Integer, String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer, String>> text2 = env.fromCollection(list2); final CrossOperator.DefaultCross<Tuple2<Integer, String>, Tuple2<Integer, String>> cross = text1.cross(text2); cross.print(); |
Union | ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer, String>> list = Lists.newArrayList(new Tuple2<>(1, "beijing"), new Tuple2<>(2, "shanghai"), new Tuple2<>(3, "guangzhou")); List<Tuple2<Integer, String>> list2 = Lists.newArrayList(new Tuple2<>(1, "zs"), new Tuple2<>(2, "ls"), new Tuple2<>(4, "ww")); DataSource<Tuple2<Integer, String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer, String>> text2 = env.fromCollection(list2); final UnionOperator<Tuple2<Integer, String>> union = text1.union(text2); union.print(); } |
Sort Partition | 在本地对所有数据集进行排序 |
First-n | 获取前面几个数据 |