1.案例一
需求:求同一个用户连续登录三天及以上的用户有哪些
1.1 数据准备
guid01,2018-03-01
guid01,2018-02-28
guid01,2018-03-01
guid01,2018-03-04
guid01,2018-03-02
guid01,2018-03-05
guid01,2018-03-06
guid01,2018-03-07
guid02,2018-03-01
guid02,2018-03-02
guid02,2018-03-03
guid02,2018-03-06字段说明:
uid: 用户id 示例(guid01)
login_time: 用户登陆时间 示例(2018-03-01)
1.2 sql实现:
第一步:
--第一步:应为同一个用户在一天之内可能登陆好几次,所以先去重(这里采用的是分组(uid+login_time)去重的方法),然后在按照uid在组内排序,并且使用row_number的方法开窗口打行号 select uid , login_time, row_number()over(partition by uid order by uid login_time) rw from v_guid group by uid , login_time;
第一步结果展示:
+------+----------+---+
| uid|login_time| rw|
+------+----------+---+
|guid02|2018-03-01| 1|
|guid02|2018-03-02| 2|
|guid02|2018-03-03| 3|
|guid02|2018-03-06| 4|
|guid01|2018-02-28| 1|
|guid01|2018-03-01| 2|
|guid01|2018-03-02| 3|
|guid01|2018-03-04| 4|
|guid01|2018-03-05| 5|
|guid01|2018-03-06| 6|
|guid01|2018-03-07| 7|
+------+----------+---+第二步:
-- 第二步:求出login_time和rw(行号)的时间差,如果是连续登陆的天数,那么所得到的时间差就是一样的, --举例说明(假如某用户,2018-03-01,2018-03-02,2018-03-03这三天是连续登陆的,打行号按照顺序为:1,2,3,那么所得的结果就都是2018-02-28,可以说明是连续登陆 --如果登陆时间为2018-03-01,2018-03-02,2018-03-03 ,2018-03-06 那么按照打行号为:1,2,3,4,那么前三天所得的结果都是:2018-02-28,但是2018-03-06所得结果就是2018-03-02,可以得知这一天和前三天是不连续的) select t1.uid, t1.login_time, t1.rw, date_sub(t1.login_time,t1.rw) sub_num from (select uid , login_time, row_number()over(partition by uid order by login_time asc) rw from v_guid group by uid ,login_time)t1
第二步结果展示:
+------+----------+---+----------+
|guid02|2018-03-01| 1|2018-02-28|
|guid02|2018-03-02| 2|2018-02-28|
|guid02|2018-03-03| 3|2018-02-28|
|guid02|2018-03-06| 4|2018-03-02|
|guid01|2018-02-28| 1|2018-02-27|
|guid01|2018-03-01| 2|2018-02-27|
|guid01|2018-03-02| 3|2018-02-27|
|guid01|2018-03-04| 4|2018-02-28|
|guid01|2018-03-05| 5|2018-02-28|
|guid01|2018-03-06| 6|2018-02-28|
|guid01|2018-03-07| 7|2018-02-28|
+------+----------+---+----------+最后一步:
--最后一步 按照用户id和时间差进行分组,然后,在组内进行聚合,求出起始时间和结束时间,并求出连续登陆的天数 select t2.uid, min(t2.login_time) start_time,--起始时间 max(t2.login_time) end_time,--结束时间 count(1) total_day --连续登陆的天数 from (select t1.uid,t1.login_time,t1.rw, date_sub(t1.login_time,t1.rw) sub_num from (select uid ,login_time, row_number()over(partition by uid order by login_time asc) rw from v_guid group by uid ,login_time)t1)t2 group by t2.uid,t2.sub_num having total_day>=3;
最终结果展示:
+------+----------+----------+---------+
| uid|start_time| end_time|total_day|
+------+----------+----------+---------+
|guid02|2018-03-01|2018-03-03| 3|
|guid01|2018-02-28|2018-03-02| 3|
|guid01|2018-03-04|2018-03-07| 4|
+------+----------+----------+---------+
2. 案例二
需求:求出同一个店铺当前月以及之前月金额的累加总和
2.1 数据准备
shop1,2019-1-18,500
shop1,2019-2-10,500
shop1,2019-2-10,200
shop1,2019-2-11,600
shop1,2019-2-12,400
shop1,2019-2-13,200
shop1,2019-2-15,100
shop1, 2019-3-5,180
shop1, 2019-4-5,280
shop1, 2019-4-6,220
shop2,2019-2-10,100
shop2,2019-2-11,100
shop2,2019-2-13,100
shop2,2019-3-15,100
shop2,2019-4-15,100字段说明:
shop1为id 店铺id
2019-1-18为dt 成交时间
500 为money 成交金额
2.2 sql实现
--第一步,先用字符串截取函数,截取出dt字段中的年和月
select id, substr(dt,0,6) sub_dt, money from v_shop
第一步结果展示:
+-----+------+-----+ | id|sub_dt|money| +-----+------+-----+ |shop1|2019/1| 500| |shop1|2019/2| 500| |shop1|2019/2| 200| |shop1|2019/2| 600| |shop1|2019/2| 400| |shop1|2019/2| 200| |shop1|2019/2| 100| |shop1|2019/3| 180| |shop1|2019/4| 280| |shop1|2019/4| 220| |shop2|2019/2| 100| |shop2|2019/2| 100| |shop2|2019/2| 100| |shop2|2019/3| 100| |shop2|2019/4| 100| +-----+------+-----+
第二步:按照id和sub_dt字段进行分组排序
select t1.id,t1.sub_dt, sum(money) total_money from (select id, substr(dt,0,6) sub_dt, money from v_shop)t1 group by t1.id,t1.sub_dt order by t1.id,t1.sub_dt
第二步结果展示
+-----+------+-----------+ | id|sub_dt|total_money| +-----+------+-----------+ |shop1|2019/1| 500.0| |shop1|2019/2| 2000.0| |shop1|2019/3| 180.0| |shop1|2019/4| 500.0| |shop2|2019/2| 300.0| |shop2|2019/3| 100.0| |shop2|2019/4| 100.0| +-----+------+-----------+
最后一步:实现当前月以及前几个月的累加,使用开窗函数,以id分组,sub__dt排序
select t2.id,t2.sub_dt,t2.total_money, sum(total_money)over(partition by t2.id order by t2.id,t2.sub_dt ) sum_money from (select t1.id,t1.sub_dt, sum(money) total_money from (select id, substr(dt,0,6) sub_dt, money from v_shop)t1 group by t1.id,t1.sub_dt order by t1.id,t1.sub_dt)t2
最终结果展示
+-----+------+-----------+---------+ | id|sub_dt|total_money|sum_money| +-----+------+-----------+---------+ |shop2|2019/2| 300.0| 300.0| |shop2|2019/3| 100.0| 400.0| |shop2|2019/4| 100.0| 500.0| |shop1|2019/1| 500.0| 500.0| |shop1|2019/2| 2000.0| 2500.0| |shop1|2019/3| 180.0| 2680.0| |shop1|2019/4| 500.0| 3180.0| +-----+------+-----------+---------+
3.案例三(压轴案例经典中的经典)
需求:当同一个用户id的begin_time减去上一条数据的end_time大于10min,再分下一组,如果小于10分钟,就将两条数据合并
3.1 数据准备
1,2020-2-18 14:20,2020-2-18 14:46,20
1,2020-2-18 14:47,2020-2-18 15:20,30
1,2020-2-18 15:37,2020-2-18 16:05,40
1,2020-2-18 16:06,2020-2-18 17:20,50
1,2020-2-18 17:21,2020-2-18 18:03,60
2,2020-2-18 14:18,2020-2-18 15:01,20
2,2020-2-18 15:20,2020-2-18 15:30,30
2,2020-2-18 16:01,2020-2-18 16:40,40
2,20