flink---6 dataSet api (2)transformation和parallel和sink

 

transfrom

ransformationDescription

Map

 

 

data.map(new MapFunction<String, Integer>() {
  public Integer map(String value) { return Integer.parseInt(value); }
});
FlatMap

参考stream

data.flatMap(new FlatMapFunction<String, String>() {
  public void flatMap(String value, Collector<String> out) {
    for (String s : value.split(" ")) {
      out.collect(s);
    }
  }
});
MapPartition

如果map中数据源需要对接第三方数据源,建议使用这个

 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        List<String> list = Lists.newArrayList("hello you","hello me");
        DataSource<String> source = env.fromCollection(list);
        final DataSet<String> mapPartitionOperator = source.mapPartition(new MapPartitionFunction<String, String>() {
            @Override
            public void mapPartition(Iterable<String> iterable, Collector<String> collector) throws Exception {
                //连接数据库
                //关闭链接
                Iterator<String> it = iterable.iterator();
                while (it.hasNext()) {
                    String next = it.next();
                    String[] split = next.split("\\W+");
                    for (String word : split) {
                        collector.collect(word);
                    }
                }
            }
        });

        mapPartitionOperator.print();
    }

 

Filter

Evaluates a boolean function for each element and retains those for which the function returns true.
IMPORTANT: The system assumes that the function does not modify the elements on which the predicate is applied. Violating this assumption can lead to incorrect results.

data.filter(new FilterFunction<Integer>() {
  public boolean filter(Integer value) { return value > 1000; }
});
Aggregate

Aggregates a group of values into a single value. Aggregation functions can be thought of as built-in reduce functions. Aggregate may be applied on a full data set, or on a grouped data set.

Dataset<Tuple3<Integer, String, Double>> input = // [...]
DataSet<Tuple3<Integer, String, Double>> output = input.aggregate(SUM, 0).and(MIN, 2);

You can also use short-hand syntax for minimum, maximum, and sum aggregations.

	Dataset<Tuple3<Integer, String, Double>> input = // [...]
DataSet<Tuple3<Integer, String, Double>> output = input.sum(0).andMin(2);
	
Distinct

去重

data.distinct();

 

JoinExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
List<Tuple2<Integer,String>> list = Lists.newArrayList(new Tuple2<>(1,"beijing"),new Tuple2<>(2,"shanghai"),new Tuple2<>(3,"guangzhou"));
List<Tuple2<Integer,String>> list2 = Lists.newArrayList(new Tuple2<>(1,"zs"),new Tuple2<>(2,"ls"),new Tuple2<>(3,"ww"));
DataSource<Tuple2<Integer,String>> text1 = env.fromCollection(list);
DataSource<Tuple2<Integer,String>> text2 = env.fromCollection(list2);
final DataSet<Tuple3<Integer, String, String>>
    with = text1.join(text2).where(0)//根据第一个元素关联
    .equalTo(0)//指定第二个数据集中需要进行比较的元素角标
    .with(
        new JoinFunction<Tuple2<Integer, String>, Tuple2<Integer, String>, Tuple3<Integer, String, String>>() {
            @Override
            public Tuple3<Integer, String, String> join(Tuple2<Integer, String> integerStringTuple2,
                                                        Tuple2<Integer, String> integerStringTuple22)
                throws Exception {
                return new Tuple3<>(integerStringTuple2.f0, integerStringTuple2.f1, integerStringTuple22.f1);
            }
        });
with.print();
OuterJoin
实际上类似于mysql中的外链接
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
List<Tuple2<Integer,String>> list = Lists.newArrayList(new Tuple2<>(1,"beijing"),new Tuple2<>(2,"shanghai"),new Tuple2<>(3,"guangzhou"));
List<Tuple2<Integer,String>> list2 = Lists.newArrayList(new Tuple2<>(1,"zs"),new Tuple2<>(2,"ls"),new Tuple2<>(4,"ww"));
DataSource<Tuple2<Integer,String>> text1 = env.fromCollection(list);
DataSource<Tuple2<Integer,String>> text2 = env.fromCollection(list2);
final DataSet<Tuple3<Integer, String, String>>
    with = text1.fullOuterJoin(text2).where(0)//根据第一个元素关联
    .equalTo(0)//指定第二个数据集中需要进行比较的元素角标
    .with(
        new JoinFunction<Tuple2<Integer, String>, Tuple2<Integer, String>, Tuple3<Integer, String, String>>() {
            @Override
            public Tuple3<Integer, String, String> join(Tuple2<Integer, String> first,
                                                        Tuple2<Integer, String> second)
                throws Exception {
                if (second==null){
                    return new Tuple3<>(first.f0,first.f1,"null");
                }else if (first ==null){
                    return new Tuple3<>(second.f0,"null",second.f1);
                }
                    else{
                    return new Tuple3<>(first.f0,first.f1,second.f1);
                }
            }
        });
with.print();
Cross

创建笛卡尔积

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
List<Tuple2<Integer, String>> list = Lists.newArrayList(new Tuple2<>(1, "beijing"), new Tuple2<>(2, "shanghai"),
    new Tuple2<>(3, "guangzhou"));
List<Tuple2<Integer, String>> list2 = Lists.newArrayList(new Tuple2<>(1, "zs"), new Tuple2<>(2, "ls"),
    new Tuple2<>(4, "ww"));
DataSource<Tuple2<Integer, String>> text1 = env.fromCollection(list);
DataSource<Tuple2<Integer, String>> text2 = env.fromCollection(list2);
final CrossOperator.DefaultCross<Tuple2<Integer, String>, Tuple2<Integer, String>> cross = text1.cross(text2);
cross.print();
Union

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List<Tuple2<Integer, String>> list = Lists.newArrayList(new Tuple2<>(1, "beijing"), new Tuple2<>(2, "shanghai"), new Tuple2<>(3, "guangzhou")); List<Tuple2<Integer, String>> list2 = Lists.newArrayList(new Tuple2<>(1, "zs"), new Tuple2<>(2, "ls"), new Tuple2<>(4, "ww")); DataSource<Tuple2<Integer, String>> text1 = env.fromCollection(list); DataSource<Tuple2<Integer, String>> text2 = env.fromCollection(list2); final UnionOperator<Tuple2<Integer, String>> union = text1.union(text2); union.print(); }

Sort Partition

在本地对所有数据集进行排序

 text1.sortPartition(0,Order.ASCENDING).sortPartition(1,Order.DESCENDING).print();
//先按第一列进行升序,再按第二列做降序
First-n

获取前面几个数据

DataSet<Tuple2<String,Integer>> in = // [...]
// regular data set
DataSet<Tuple2<String,Integer>> result1 = in.first(3);返回一个集合中的三个数据
// grouped data set
DataSet<Tuple2<String,Integer>> result2 = in.groupBy(0)返回每个分组的前三个数据
                                            .first(3);
// grouped-sorted data set
DataSet<Tuple2<String,Integer>> result3 = in.groupBy(0)
                                            .sortGroup(1, Order.ASCENDING)
                                            .first(3);//根据第二列进行组内排序


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值