Flink API - State


Flink 有两种状态,键控状态(keyed state)和操作符状态(operator state)。我们重点来看一下键控状态。我们来看一下如何在 RuntimeContext 中定义键控状态。
自定义函数可以使用 keyed state 来存储和访问 key 对应的状态。对于每一个 key,Flink 将会维护一个状态实例。一个操作符的状态实例将会被分发到操作符的所有并行任务中去。这表明函数的每一个并行任务只为所有 key 的某一部分 key 保存 key 对应的状态实例。所以 keyed state 和分布式 key-value map 数据结构非常类似。
keyed state 仅可用于 KeyedStream。Flink 支持以下数据类型的状态变量:
● ValueState 保存单个的值,值的类型为 T。
○ get 操作: ValueState.value()
○ set 操作: ValueState.update(T value)
● ListState 保存一个列表,列表里的元素的数据类型为 T。基本操作如下:
○ ListState.add(T value)
○ ListState.addAll(List values)
○ ListState.get()返回Iterable
○ ListState.update(List values)
● MapState<K, V> 保存 Key-Value 对。
○ MapState.get(K key)
○ MapState.put(K key, V value)
○ MapState.contains(K key)
○ MapState.remove(K key)
● ReducingState
● AggregatingState<I, O>
不同key对应的keyed state是相互隔离的

ValueState

public class ValueStates {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);


        env
                .addSource(new ClickSource())
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy.<Event>forMonotonousTimestamps()
                                .withTimestampAssigner((Event element, long recordTimestamp) -> element.timestamp)
                )
                .keyBy(event -> event.user)
                .process(

                        // 每隔 10s 统计一下 pv
                        new KeyedProcessFunction<String, Event, String>() {

                            private ValueState<Long> pvState;
                            private ValueState<Long> timerTs;

                            @Override
                            public void open(Configuration parameters) throws Exception {
                                super.open(parameters);


                                StateTtlConfig ttlConfig = StateTtlConfig
                                        // ttl 10s
                                        .newBuilder(Time.seconds(10))
                                        // 仅在创建和写入时更新
                                        .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
                                        // 不返回过期数据
                                        .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
                                        .disableCleanupInBackground()
                                        .build();

                                ValueStateDescriptor<Long> pvStateDescriptor = new ValueStateDescriptor<>("pv", Types.LONG);
                                pvStateDescriptor.enableTimeToLive(ttlConfig);
                                pvState = getRuntimeContext().getState(pvStateDescriptor);




                                timerTs = getRuntimeContext().getState(new ValueStateDescriptor<Long>("timer-ts", Types.LONG));
                            }

                            @Override
                            public void processElement(Event value, Context ctx, Collector<String> out) throws Exception {

                                if (pvState.value() == null) {
                                    pvState.update(1L);
                                } else {
                                    pvState.update(pvState.value() + 1);
                                }

                                if (timerTs.value() == null) {
                                    timerTs.update(value.timestamp + 5 * 1000L);
                                    ctx.timerService().registerEventTimeTimer(value.timestamp + 5 * 1000L);
                                }

                            }

                            @Override
                            public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
                                super.onTimer(timestamp, ctx, out);
                                out.collect("key = " + ctx.getCurrentKey() + " 5s pv = " + pvState.value());
                                timerTs.clear();
                            }
                        })
                .print();


        env.execute();


    }

}

ListStates

public class ListStates {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);


        // 使用 ListState 实现 sliding countWindow

        env
                .addSource(new ClickSource())
                .assignTimestampsAndWatermarks(WatermarkStrategy.forMonotonousTimestamps())
                .keyBy(e -> true)
                .process(
                        new KeyedProcessFunction<Boolean, Event, String>() {

                            private ListState<Event> windowAllElement;

                            private final Integer COUNT = 50;
                            private final Integer SLIDE = 20;


                            @Override
                            public void open(Configuration parameters) throws Exception {
                                super.open(parameters);
                                windowAllElement = getRuntimeContext().getListState(new ListStateDescriptor<Event>("window-all-element", Types.POJO(Event.class)));
                            }

                            @Override
                            public void processElement(Event value, Context ctx, Collector<String> out) throws Exception {

                                windowAllElement.add(value);

                                long count = windowAllElement.get().spliterator().getExactSizeIfKnown();

                                if (count % SLIDE == 0) {
                                    ctx.timerService().registerProcessingTimeTimer(ctx.timerService().currentProcessingTime() + 1L);
                                }

                                if (count == COUNT) {

                                    ctx.timerService().registerProcessingTimeTimer(ctx.timerService().currentProcessingTime() + 1L);

                                }

                            }


                            @Override
                            public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
                                super.onTimer(timestamp, ctx, out);

                                long count = windowAllElement.get().spliterator().getExactSizeIfKnown();
                                if (count % SLIDE == 0) {


                                    // 20
                                    out.collect("20 - > 窗口中有 --> 1 " + count + " 条数据");

                                    if (count > COUNT) {

                                        // 移除多余的元素

                                        int evictorCount = 0;

                                        for (Iterator<Event> iterator = windowAllElement.get().iterator(); iterator.hasNext(); ) {

                                            Event next = iterator.next();
                                            evictorCount++;

                                            if (evictorCount > count - COUNT) {
                                                break;
                                            } else {
                                                System.out.println("remove: " + next);
                                                iterator.remove();
                                            }

                                        }


                                    }


                                    out.collect("20 - > 窗口中有 --> 1 " + windowAllElement.get().spliterator().getExactSizeIfKnown() + " 条数据");


                                } else {

                                    // 50
                                    out.collect("50 窗口中有 " + count + " 条数据");

                                }


                            }
                        }
                )
                .print();


        env.execute();


    }

}

MapStates

public class MapStates {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);


        env
                .addSource(new ClickSource())
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Event>forBoundedOutOfOrderness(Duration.ofSeconds(0L))
                                .withTimestampAssigner(
                                        new SerializableTimestampAssigner<Event>() {
                                            @Override
                                            public long extractTimestamp(Event element, long recordTimestamp) {
                                                return element.timestamp;
                                            }
                                        }
                                )
                )
                .keyBy(event -> event.user)
                .process(

                        // 每 10s 统计一下 每个用户 每个 页面的点击 pv

                        new KeyedProcessFunction<String, Event, String>() {

                            private ValueState<Long> timerT;
                            private MapState<String, Long> avgPagePv;

                            @Override
                            public void open(Configuration parameters) throws Exception {
                                super.open(parameters);
                                timerT = getRuntimeContext().getState(new ValueStateDescriptor<Long>("timer-ts", Types.LONG));
                                avgPagePv = getRuntimeContext().getMapState(new MapStateDescriptor<String, Long>("avg-page-pv", Types.STRING, Types.LONG));
                            }

                            @Override
                            public void processElement(Event value, Context ctx, Collector<String> out) throws Exception {

                                if (!avgPagePv.contains(value.url)) {
                                    avgPagePv.put(value.url, 1L);
                                } else {
                                    avgPagePv.put(value.url, avgPagePv.get(value.url) + 1L);
                                }

                                if (timerT.value() == null) {
                                    long ts = value.timestamp + 10 * 1000L;
                                    ctx.timerService().registerEventTimeTimer(ts);
                                    timerT.update(ts);
                                }

                            }

                            @Override
                            public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
                                super.onTimer(timestamp, ctx, out);

                                Iterable<Map.Entry<String, Long>> entries = avgPagePv.entries();

                                String key = ctx.getCurrentKey();


                                entries.forEach(elem -> out.collect("key = " + key + "\turl:" + elem.getKey() + "\tpv:" + elem.getValue()));


                                timerT.clear();


                            }
                        })
                .print();


        env.execute();


    }

}

ReducingStates & AggregatingStates

public class AggregatingStates {

    public static void main(String[] args) throws Exception {


        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        /*
            AggregatingState<IN, OUT>
                This keeps a single value that represents the aggregation of all values added to the state.
                Contrary to ReducingState, the aggregate type may be different from the type of elements that are added to the state.
                The interface is the same as for ListState but elements added using add(IN) are aggregated using a specified AggregateFunction.
         */


        env
                .addSource(new SourceFunction<Tuple2<String, Integer>>() {

                    private List<Character> letters = new ArrayList<>(Arrays.asList('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'));

                    private Random random = new Random();

                    private boolean running = true;

                    @Override
                    public void run(SourceContext<Tuple2<String, Integer>> ctx) throws Exception {

                        while (running) {

                            ctx.collect(Tuple2.of(String.valueOf(letters.get(random.nextInt(26))), random.nextInt(30)));

                            Thread.sleep(10);

                        }

                    }

                    @Override
                    public void cancel() {
                        running = false;
                    }
                })
                .keyBy(e -> e.f0)
                .flatMap(
                        new RichFlatMapFunction<Tuple2<String, Integer>, String>() {

                            private int count = 0;

                            private transient AggregatingState<Tuple2<String, Integer>, Integer> aggState;

                            @Override
                            public void open(Configuration parameters) throws Exception {
                                super.open(parameters);

                                AggregatingStateDescriptor<Tuple2<String, Integer>, Integer, Integer> aggStateDescriptor
                                        = new AggregatingStateDescriptor<>(
                                        "agg-state",
                                        new AggregateFunction<Tuple2<String, Integer>, Integer, Integer>() {
                                            @Override
                                            public Integer createAccumulator() {
                                                return 0;
                                            }

                                            @Override
                                            public Integer add(Tuple2<String, Integer> value, Integer accumulator) {
                                                return accumulator + value.f1;
                                            }

                                            @Override
                                            public Integer getResult(Integer accumulator) {
                                                return accumulator;
                                            }

                                            @Override
                                            public Integer merge(Integer a, Integer b) {
                                                return a + b;
                                            }
                                        },
                                        Types.INT);

                                aggState = getRuntimeContext().getAggregatingState(aggStateDescriptor);


                            }

                            @Override
                            public void flatMap(Tuple2<String, Integer> value, Collector<String> out) throws Exception {

                                count++;

                                if (count % 50 == 0) {
                                    out.collect( "key = " + value.f0 + "\tcount = " + aggState.get());
                                    aggState.clear();
                                } else {
                                    // 增量更新 AggregatingState ,每来一条元素 acc + f1
                                    aggState.add(value);
                                }


                            }
                        }
                )
                .print();

        env.execute();

    }

}

BroadcastStates

public class BroadcastStates {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);


        DataStreamSource<Tuple2<String, Integer>> ruleStream = env
                .fromElements(
                        Tuple2.of("Mary", 100),
                        Tuple2.of("Bob", 98),
                        Tuple2.of("other", 0)

                );


        MapStateDescriptor<String, Tuple2<String, Integer>> ruleStateDescriptor = new MapStateDescriptor<>(
                "rule-state",
                BasicTypeInfo.STRING_TYPE_INFO,
                Types.TUPLE(Types.STRING, Types.INT)
        );

        BroadcastStream<Tuple2<String, Integer>> broadcastStream = ruleStream.broadcast(ruleStateDescriptor);


        /// ==========================


        /*
            非广播流的类型
                如果流是一个 keyed 流,那就是 KeyedBroadcastProcessFunction 类型
                如果流是一个 non-keyed 流,那就是 BroadcastProcessFunction 类型
         */


        KeyedStream<Event, String> clickKeyStream = env.addSource(new ClickSource()).keyBy(event -> event.user);

        SingleOutputStreamOperator<String> result = clickKeyStream
                .connect(broadcastStream)
                .process(new KeyedBroadcastProcessFunction<String, Event, Tuple2<String, Integer>, String>() {
                    @Override
                    public void processElement(Event event, ReadOnlyContext ctx, Collector<String> out) throws Exception {

                        Iterable<Map.Entry<String, Tuple2<String, Integer>>> entryIterable = ctx.getBroadcastState(ruleStateDescriptor).immutableEntries();

                        for (Map.Entry<String, Tuple2<String, Integer>> entry : entryIterable) {

                            String key = entry.getKey();
                            Tuple2<String, Integer> val = entry.getValue();


                            if (key.equals(event.user)) {
                                out.collect(event + " -> " + val);
                            } else {
                                out.collect(event + " -> -> " + val);
                            }

                        }

                    }

                    @Override
                    public void processBroadcastElement(Tuple2<String, Integer> value, Context ctx, Collector<String> out) throws Exception {

                        ctx.getBroadcastState(ruleStateDescriptor).put(value.f0, value);

                    }
                });


        result.print();


        env.execute();


    }

}

参考资料
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/fault-tolerance/state/
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/libs/state_processor_api/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值