里面用了一些自定义的日期函数,使用起来更加方便
当然hive也有一些日期函数
from_unixtime(unix时间戳) 可以转换成日期时间,后面多参 可以加上格式
unix_timestamp("2019-10-10 11:11:11") 可以转换成unix时间戳 后面多参 可以加上格式
unix 时间戳是10位的 到秒 和13位的不一样 可以转换
cast(13位时间戳/1000 as bigint) 可以转换成unix时间戳
weekofyear("2019-10-10 11:11:11") 可以获取第几周
datediff(“2019-12-11”,"2020-11-11") 差值
to_date("2019-10-10 11:11:11") 2019-10-10
简单的优化:
因为设置了分区表,所以可以进行先过滤一下。
CREATE external TABLE ext_startup_logs(
userId string,
appPlatform string,
appId string,
startTimeInMs bigint,
activeTimeInMs bigint,
appVersion string,
city string)
PARTITIONED BY (y string, m string,d string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
函数:
getdaybegin getweekbegin getmonthbegin formattime
今天appid为app00001的不同版本的活跃用户数
select
appVersion,
count(distinct userId) c
from ext_startup_logs
where appId = "app00001"
and concat (y,m,d) = formattime(getdaybegin(),"yyyyMMdd")
group by appVersion
本周内每天各版本日活跃数
select
appVersion,
formattime(startTimeInMs,"yyyyMMdd") day,
count(distinct userId) c
from ext_startup_logs
where appId = 'app00001'
and concat (y,m,d) >= formattime(getweekbegin(),"yyyyMMdd")
group by appVersion,formattime(startTimeInMs,"yyyyMMdd")
select
formattime(startTimeInMs,'yyyyMMdd'),appVersion , count(distinct userId)
from ext_startup_logs
where appId = 'app00001'
and concat(y,m,d) >= formattime(getweekbegin(),'yyyyMMdd')
group by formattime(startTimeInMs,'yyyyMMdd'), appVersion;
5.3.10 留存分析统计
1. 本周回流用户统计
本周回流用户:上周没有启动过,本周启动过
1. 任意日新增用户
1) 今天新增用户
//选出用户的最小登陆时间大于今天的起始时间
select
userId,
min(startTimeInMs) minStartTime
from ext_startup_logs
group by userId
having
minStartTime >= getdaybegin()
and minStartTime < getdaybegin(1)
2)昨日新增用户
//最小的登陆时间大于昨天的起始时间,小于今天的起始时间
select
userId,
min(startTimeInMs) minStartTime
from ext_startup_logs
group by userId
having
minStartTime >= getdaybegin(-1)
and minStartTime < getdaybegin()
3)指定日期的 2019/1/1 10:10:10
select
userId,
min(startTimeInMs) minStartTime
from ext_startup_logs
group by userId
having
minStartTime >= getdaybegin("2019/1/1 10:10:10")
and minStartTime < getdaybegin("2019/1/1 10:10:10",1)
2. 任意周新增用户
1) 本周新增用户
select
userId,
min(startTimeInMs) minStartTime
from ext_startup_logs
group by userId
having
minStartTime >= getweekbegin()
and minStartTime < getdaybegin(1)
活跃用户统计
1) 日活跃用户数
//启动记录有很多条,每天都有重复的
//求今天的活跃用户,
//求启动时间 >今天开始毫秒值 并 <明天开始毫秒值
//然后去重,count 一下
select
count(distinct userId)
from ext_startup_logs
where startTimeInMs > getdaybegin()
and startTimeInMs < getdaybegin(1)
//周活跃人数
select
count(distinct userId)
from ext_startup_logs
where startTimeInMs > getweekbegin()
and startTimeInMs < getweekbegin(1)
1) 一次查询出一周内,每天的日活跃数
//startTimeInMs 要大于本周的起始时间 小于下周的起始时间
//按日期分组 count(distinct userId)
select
count(distinct userId),
formattime(startTimeInMs,"yyyy/MM/dd") day
from ext_startup_logs
where startTimeInMs >getweekbegin()
and startTimeInMs < getweekbegin(1)
group by formattime(startTimeInMs,"yyyy/MM/dd")
1553145326514
1489029488
2) 一次查询出过去的5周,每周的周活跃数
from_unixtime(1323308943) 这种是unix的时间戳 用不了 只精确到毫秒
weekofyear('2016-12-08 10:03:01')
select
count(distinct userId),
weekofyear(formattime(startTimeInMs,"yyyy-MM-dd HH:mm:ss")) week
from ext_startup_logs
where startTimeInMs >getweekbegin(-4)
and startTimeInMs < getweekbegin(1)
group by weekofyear(formattime(startTimeInMs,"yyyy-MM-dd HH:mm:ss"))
//不用自定义的函数
select
count(distinct userId),
weekofyear(from_unixtime(cast (startTimeInMs/1000 as bigint) ) ) week
from ext_startup_logs
where startTimeInMs >getweekbegin(-4)
and startTimeInMs < getweekbegin(1)
group by weekofyear(from_unixtime(cast (startTimeInMs/1000 as bigint)) )
3) 一次查询出过去的三个月内,每月的月活跃数
select
count(distinct userId),
formattime(startTimeInMs,"yyyy-MM",0) month
from ext_startup_logs
where startTimeInMs >getmonthbegin(-3)
and startTimeInMs < getmonthbegin()
group by formattime(startTimeInMs,"yyyy-MM",0) month
select
count(distinct userId),
formattime(startTimeInMs,"yyyy-MM",0) month
from ext_startup_logs
where startTimeInMs >getmonthbegin(-3)
and startTimeInMs < getmonthbegin()
group by formattime(startTimeInMs,"yyyy-MM",0)
优化 ,根据分区表先进行过滤
//过去的五周(包含本周)某个app每周的周活跃用户数
select
count(distinct userId),
weekofyear(formattime(startTimeInMs,"yyyy-MM-dd HH:mm:ss")) week
from ext_startup_logs
where concat(y,m,d) >= formattime(getweekbegin(-4),"yyyyMMdd")
and concat(y,m,d) < formattime(getweekbegin(1),"yyyyMMdd")
group by weekofyear(formattime(startTimeInMs,"yyyy-MM-dd HH:mm:ss"))
连续活跃3周用户统计
select
userId,
count(distinct weekofyear(from_unixtime(cast(startTimeInMs/1000 as bigint)))) c
from ext_startup_logs
where concat(y,m,d) >= formattime(getweekbegin(-2),"yyyyMMdd")
group by userId
having c>=3
沉默用户统计
//只启动了一次 之后不再启动 对应只有一条日志
//刚安装了两天的不算沉默用户
select
userId,
count(userId) c
from ext_startup_logs
where concat(y,m,d) < formattime(getdaybegin(-1),"yyyyMMdd")
group by userId
having c = 1
今天启动次数统计