题目
牛客网SQL160——国庆期间每类视频点赞量和转发量
题目描述
用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
---|---|---|---|---|---|---|---|---|
1 | 101 | 2001 | 2021-09-24 10:00:00 | 2021-09-24 10:00:20 | 1 | 1 | 0 | NULL |
2 | 105 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 0 | 0 | 1 | NULL |
3 | 102 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1 | 1 | 1 | NULL |
4 | 101 | 2002 | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1 | 0 | 1 | NULL |
5 | 101 | 2002 | 2021-09-27 11:00:00 | 2021-09-27 11:00:30 | 1 | 1 | 0 | NULL |
6 | 102 | 2002 | 2021-09-28 11:00:00 | 2021-09-28 11:00:30 | 1 | 0 | 1 | NULL |
7 | 103 | 2002 | 2021-09-29 11:00:00 | 2021-10-02 11:00:30 | 1 | 0 | 1 | NULL |
8 | 102 | 2002 | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1 | 1 | 1 | NULL |
9 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1 | 1 | 0 | NULL |
10 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
11 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1 | 1 | 0 | 1732526 |
12 | 106 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 2 | 0 | 1 | NULL |
13 | 107 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 0 | 1 | NULL |
14 | 108 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 1 | 1 | NULL |
15 | 109 | 2002 | 2021-10-03 10:59:05 | 2021-10-03 11:00:05 | 0 | 1 | 0 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。
输出示例:
tag | dt | sum_like_cnt_7d | max_retweet_cnt_7d |
---|---|---|---|
旅游 | 2021-10-01 | 5 | 2 |
旅游 | 2021-10-02 | 5 | 3 |
旅游 | 2021-10-03 | 6 | 3 |
求解思路
题目需要解出国庆节前三天的7天内,我们不能先筛选国庆节前三天这个条件,不然就会丢失7天内的数据。因此正确的思路就是先截取出2021-10-03之前9天(7+3-1=9)时间范围内的数据。此时需要用到 datediff 函数 配合 date_format 函数。
datediff('2021-10-03', date_format(start_time, '%Y-%m-%d'))<9
之后在规定日期期限内计算每日点赞量和转发量
SELECT
tag,
DATE_FORMAT(start_time,'%Y-%m-%d') as dt,
SUM(if_like) as like_cnt,
SUM(if_retweet) as ret_cnt
FROM tb_user_video_log JOIN tb_video_info USING(video_id)
WHERE DATEDIFF('2021-10-03', DATE_FORMAT(start_time,'%Y-%m-%d'))<9
GROUP BY dt, tag;
到此我们会得到2021-09-25至2021-10-03每个类别每个日期的点赞量和转发量的数据。之后我们使用窗口函数来限定7日窗口以进行累计点赞量和最大转发量。注意,这里是6行,即当前行+6=7行。
select
tag,
dt,
sum(like_cnt) over(partition by tag order by dt rows 6 preceding) as sum_like_cnt_7d,
max(ret_cnt) over(partition by tag order by dt rows 6 preceding) as max_retweet_cnt_7d
from
(
SELECT
tag,
DATE_FORMAT(start_time,'%Y-%m-%d') as dt,
SUM(if_like) as like_cnt,
SUM(if_retweet) as ret_cnt
FROM tb_user_video_log JOIN tb_video_info USING(video_id)
WHERE DATEDIFF('2021-10-03', DATE_FORMAT(start_time,'%Y-%m-%d'))<9
GROUP BY dt, tag
) t1
之后在补充日期限定、类别日期排序等细节
select
tag,
dt,
sum(like_cnt) over(partition by tag order by dt rows 6 preceding) as sum_like_cnt_7d,
max(ret_cnt) over(partition by tag order by dt rows 6 preceding) as max_retweet_cnt_7d
from
(
SELECT
tag,
DATE_FORMAT(start_time,'%Y-%m-%d') as dt,
SUM(if_like) as like_cnt,
SUM(if_retweet) as ret_cnt
FROM tb_user_video_log JOIN tb_video_info USING(video_id)
WHERE DATEDIFF('2021-10-03', DATE_FORMAT(start_time,'%Y-%m-%d'))<9
GROUP BY dt, tag
) t1
where dt between '2021-10-01' and '2021-10-03'
order by tag desc, dt asc
done