做了那么多次flink统计,发现我居然没写过uv,pv统计(因为uv,pv实时统计,公共平台帮做了),最近找了一些资料当练手了。
public static final DateTimeFormatter TIME_FORMAT_YYYY_MM_DD_HHMMSS = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
...
Properties propsConsumer = ...//Kafka配置;
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
FlinkKafkaConsumer011<String> detailLog = new FlinkKafkaConsumer011<String>("test-topic", new SimpleStringSchema(), propsConsumer);
detailLog.setStartFromLatest();
DataStream<String> detailStream = env.addSource(detailLog).name("uv-pv_log").disableChaining();
detailStream.print();
DataStream<Tuple2<UMessage,Integer>> detail = detailStream.map(new MapFunction<String, Tuple2<UMessage,Integer>>() {
@Override
public Tuple2<UMessage,Integer> map(String value) throws Exception {
try {
UMessage uMessage = JSON.parseObject(value, UMessage.class);
return Tuple2.of(uMessage,1);
} catch (Exception e) {
e.printStackTrace();
}
return Tuple2.of(null,null);
}
}).filter(s->s!=null&&s.f0!=null).assignTimestampsAndWatermarks( new AscendingTimestampExtractor<Tuple2<UMessage,Integer>>() {
@Override
public long extractAscendingTimestamp(Tuple2<UMessage, Integer> element) {
LocalDate localDate=LocalDate.parse(element.f0.getCreateTime(),TIME_FORMAT_YYYY_MM_DD_HHMMSS);
long timestamp = localDate.atStartOfDay(ZoneId.systemDefault()).toInstant().toEpochMilli();
return timestamp;
}
});
采用event时间,propsConsumer是kafka配置消息(忽略),UMessage的POJO类
import lombok.Builder;
import lombok.Data;
@Data
@Builder
public class UMessage {
private String uid;
private String createTime;
public UMessage() {
}
public UMessage(String uid, String createTime) {
this.uid = uid;
this.createTime = createTime;
}
}
这段代码读数据源解析json文本,然后指定eventTime。
计算pv和uv
- 写法一:
DataStream<Tuple2<String,Integer>> statsResult=detail.windowAll(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
// .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(10)))
.trigger(CountTrigger.of(1))
.process(new ProcessAllWindowFunction<Tuple2<UMessage, Integer>, Tuple2<String,Integer>, TimeWindow>() {
@Override
public void process(Context context, Iterable<Tuple2<UMessage, Integer>> elements, Collector<Tuple2< String, Integer>> out) throws Exception {
Set<String> uvNameSet=new HashSet<String>();
Integer pv=0;
Iterator<Tuple2<UMessage,Integer>> mapIterator=elements.iterator();
while(mapIterator.hasNext()){
pv+=1;
String uvName=mapIterator.next().f0.getUid();