页面广告分析_ 页面广告点击量统计
电商网站的市场营销商业指标中,除了自身的 APP 推广,还会考虑到页面上的
广告投放(包括自己经营的产品和其它网站的广告)。所以广告相关的统计分析,
也是市场营销的重要指标。
对于广告的统计,最简单也最重要的就是页面广告的点击量,网站往往需要根
据广告点击量来制定定价策略和调整推广方式,而且也可以借此收集用户的偏好信
息。更加具体的应用是,我们可以根据用户的地理位置进行划分,从而总结出不同
省份用户对不同广告的偏好,这样更有助于广告的精准投放。
以 province 进行 keyBy,然后开一小时的时间窗口, 滑动距离为 5 秒,统计窗口内的点击事件数量
黑名单过滤
同一用户的重复点击是会叠加计算的。在实际场 景中,同一用户确实可能反复点开同一个广告,这也说明了用户对广告更大的兴趣; 但是如果用户在一段时间非常频繁地点击广告,这显然不是一个正常行为,有刷点击量的嫌疑。所以我们可以对一段时间内(比如一天内)的用户点击行为进行约束, 如果对同一个广告点击超过一定限额(比如 100 次),应该把该用户加入黑名单并报警,此后其点击行为不应该再统计
输入数据:userId adId 省 城市 时间戳
561558 3611281 guangdong shenzhen 1511658120
package UserBehaviorAnalysis.MarketAnalysis
import java.sql.Timestamp
import org.apache.flink.api.common.functions.AggregateFunction
import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor}
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.KeyedProcessFunction
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.function.WindowFunction
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import org.apache.flink.util.Collector
// 定义输入样例类
case class AdClickLog(userId: Long, adId: Long, province: String, city: String, timestamp: Long)
// 定义输出样例类
case class AdClickCountByProvince(windowEnd: String, province: String, count: Long)
// 侧输出流黑名单报警信息样例类
case class BlackListUserWarning(userId: Long, adId: Long, msg: String)
object AdClickAnalysis {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
val inputStream = env.readTextFile("D:\\Mywork\\workspace\\Project_idea\\flink-2021\\src\\main\\resources\\AdClickLog.csv")
val adLogStream = inputStream.map { data =>
val arr = data.split(",")
AdClickLog(arr(0).toLong, arr(1).toLong, arr(2), arr(3), arr(4).toInt)
}.assignAscendingTimestamps(_.timestamp * 1000L)
// 插入一步过滤操作,并将有刷单行为的用户输出到侧输出流(黑名单报警)
val filterBlackListUserStream = adLogStream
.keyBy(data => (data.userId, data.adId))
.process(new FliterBlackListUserResult(100))
val adCountResultStream = filterBlackListUserStream
.keyBy(_.province)
.timeWindow(Time.hours(1), Time.seconds(5))
.aggregate(new AdCountAgg(), new AdCountWindowResult())
filterBlackListUserStream.getSideOutput(new OutputTag[BlackListUserWarning]("warning")).print("warning")
adCountResultStream.print("count result")
env.execute()
}
}
class AdCountAgg() extends AggregateFunction[AdClickLog, Long, Long]{
override def createAccumulator(): Long = 0L
override def add(value: AdClickLog, accumulator: Long): Long = accumulator + 1
override def getResult(accumulator: Long): Long = accumulator
override def merge(a: Long, b: Long): Long = a + b
}
class AdCountWindowResult() extends WindowFunction[Long, AdClickCountByProvince, String, TimeWindow]{
override def apply(key: String, window: TimeWindow, input: Iterable[Long], out: Collector[AdClickCountByProvince]): Unit = {
val end = new Timestamp(window.getEnd).toString
out.collect(AdClickCountByProvince(end, key, input.head))
}
}
// 自定义KeyedProcessFunction,黑名单过滤
class FliterBlackListUserResult(maxCount: Long) extends KeyedProcessFunction[(Long, Long), AdClickLog, AdClickLog]{
// 定义状态,保存用户对广告的点击量,每天0点定时清空状态的时间戳,标记当前用户是否已经进入黑名单
lazy val countState: ValueState[Long] = getRuntimeContext.getState(new ValueStateDescriptor[Long]("Adcount", classOf[Long]))
lazy val resetTimerTsState: ValueState[Long] = getRuntimeContext.getState(new ValueStateDescriptor[Long]("reset-ts", classOf[Long]))
lazy val isBlackState: ValueState[Boolean] = getRuntimeContext.getState(new ValueStateDescriptor[Boolean]("is-back", classOf[Boolean]))
override def processElement(i: AdClickLog, context: KeyedProcessFunction[(Long, Long), AdClickLog, AdClickLog]#Context, collector: Collector[AdClickLog]): Unit = {
val curCount = countState.value()
// 判断只要是第一个数据来了,直接注册0点的清空状态定时器
if (curCount == 0){
val ts = (context.timerService().currentProcessingTime()/(1000 * 60 * 60 * 24) + 1) * (24 * 60 * 60 * 1000) - 8 * 60 * 60 * 1000
resetTimerTsState.update(ts)
context.timerService().registerProcessingTimeTimer(ts)
}
// 判断count值是否已经达到定义的阈值,如果超过就输出到黑名单
if (curCount >= maxCount){
// 判断是否已经在黑名单里,没有的话才输出侧输出流
if (!isBlackState.value()){
isBlackState.update(true)
context.output(new OutputTag[BlackListUserWarning]("warning"), BlackListUserWarning(i.userId, i.adId, "Click ad over " + maxCount + " times today."))
}
return
}
// 正常情况,count加1,然后将数据原样输出
countState.update(curCount + 1)
collector.collect(i)
}
override def onTimer(timestamp: Long, ctx: KeyedProcessFunction[(Long, Long), AdClickLog, AdClickLog]#OnTimerContext, out: Collector[AdClickLog]): Unit = {
if (timestamp == resetTimerTsState.value()){
isBlackState.clear()
countState.clear()
}
}
}
//输出数据
warning> BlackListUserWarning(937166,1715,Click ad over 100 times today.)
count result> AdClickCountByProvince(2017-11-26 09:00:05.0,beijing,1)
count result> AdClickCountByProvince(2017-11-26 09:00:10.0,beijing,1)