Flink数据统计UV、PV统计(三种写法)

做了那么多次flink统计,发现我居然没写过uv,pv统计(因为uv,pv实时统计,公共平台帮做了),最近找了一些资料当练手了。

		public static final DateTimeFormatter TIME_FORMAT_YYYY_MM_DD_HHMMSS = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
		...
		 Properties propsConsumer = ...//Kafka配置;
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        FlinkKafkaConsumer011<String> detailLog = new FlinkKafkaConsumer011<String>("test-topic", new SimpleStringSchema(), propsConsumer);
        detailLog.setStartFromLatest();
        DataStream<String> detailStream = env.addSource(detailLog).name("uv-pv_log").disableChaining();
        detailStream.print();
        DataStream<Tuple2<UMessage,Integer>> detail = detailStream.map(new MapFunction<String, Tuple2<UMessage,Integer>>() {
            @Override
            public Tuple2<UMessage,Integer> map(String value) throws Exception {
                try {
                    UMessage uMessage = JSON.parseObject(value, UMessage.class);
                    return Tuple2.of(uMessage,1);
                } catch (Exception e) {
                    e.printStackTrace();
                }
                return Tuple2.of(null,null);
            }
        }).filter(s->s!=null&&s.f0!=null).assignTimestampsAndWatermarks( new AscendingTimestampExtractor<Tuple2<UMessage,Integer>>() {
            @Override
            public long extractAscendingTimestamp(Tuple2<UMessage, Integer> element) {
                LocalDate localDate=LocalDate.parse(element.f0.getCreateTime(),TIME_FORMAT_YYYY_MM_DD_HHMMSS);
                long timestamp = localDate.atStartOfDay(ZoneId.systemDefault()).toInstant().toEpochMilli();
                return timestamp;
            }
        });

采用event时间,propsConsumer是kafka配置消息(忽略),UMessage的POJO类

import lombok.Builder;
import lombok.Data;

@Data
@Builder
public class UMessage {

    private String uid;

    private String createTime;

    public UMessage() {
    }

    public UMessage(String uid, String createTime) {
        this.uid = uid;
        this.createTime = createTime;
    }
}

这段代码读数据源解析json文本,然后指定eventTime。


计算pv和uv

  • 写法一:
        DataStream<Tuple2<String,Integer>> statsResult=detail.windowAll(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
//               .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(10)))
                .trigger(CountTrigger.of(1))
                .process(new ProcessAllWindowFunction<Tuple2<UMessage, Integer>, Tuple2<String,Integer>, TimeWindow>() {
                    @Override
                    public void process(Context context, Iterable<Tuple2<UMessage, Integer>> elements, Collector<Tuple2< String, Integer>> out) throws Exception {
                        Set<String> uvNameSet=new HashSet<String>();
                        Integer pv=0;
                        Iterator<Tuple2<UMessage,Integer>> mapIterator=elements.iterator();
                        while(mapIterator.hasNext()){
                            pv+=1;
                            String uvName=mapIterator.next().f0.getUid();
                
评论 17
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值