问题:计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序
SELECT
a.video_id,
round(sum(if(a.end_time-a.start_time>=b.duration,1,0))/count(a.video_id),3) as wbl
FROM
tb_user_video_log a
JOIN tb_video_info b ON a.video_id = b.video_id
WHERE YEAR(a.start_time) = 2021
GROUP BY
a.video_id
ORDER BY
wbl DESC;
问题:计算各类视频的平均播放进度,将进度大于60%的类别输出。
- 关联用户-视频互动记录和短视频信息表:JOIN tb_video_info USING(video_id);
- 按视频类别分组:GROUP BY tag
- 计算每个类别的平均播放进度:
- 播放进度=播放时长÷视频时长*100%
- 播放时长=TIMESTAMPDIFF(SECOND, start_time, end_time);特殊情况:播放时长大于视频时长时,播放进度为100%(加个IF判断)
- 平均进度=AVG(每个进度)
- 结果保留2位小数:ROUND(x, 2)
- 百分比格式化:CONCAT(x, '%')
- 计算公式:
1 2 3 4 |
|
- 筛选播放进度>60%的视频类别:HAVING avg_play_progress > 60
注意坑:
1.一定要先*100再保留两位小数
2.时间相减直接用减号【a.end_time-a.start_time】有效性差,容易出错,最好用函数,如timestampdif(天(DAY)、小时(HOUR),分钟(MINUTE)和秒(SECOND),早的时间点,晚的时间点)
3.要加一个if
if(
timestampdiff(second, a.start_time, a.end_time) > duration,
1,
timestampdiff(second, a.start_time, a.end_time) / duration
)
不能直接用 timestampdiff(second, a.start_time, a.end_time) / duration,容易失真
SELECT
c.tag,
CONCAT(c.avg_dura, "%")
FROM
(
SELECT
b.tag,
round(
avg(
if(
timestampdiff(second, a.start_time, a.end_time) > duration,
1,
timestampdiff(second, a.start_time, a.end_time) / duration
)
) * 100,
2
) as avg_dura
FROM
tb_user_video_log a
join tb_video_info b on a.video_id = b.video_id
group by
b.tag
HAVING
avg_dura > 60
order by
avg_dura desc
) c;
问题:SQL158 每类视频近一个月的转发量/率
问题:统计在有用户互动的最近一个月(按包含当天在内的近30天算,比如10月31日的近30天为10.2~10.31之间的数据)中,每类视频的转发量和转发率(保留3位小数)。
注:转发率=转发量÷播放量。结果按转发率降序排序。
分析:
统计在有用户互动的最近一个月:
where DATEDIFF( DATE( ( SELECT MAX(start_time) FROM tb_user_video_log)), DATE(a.start_time)) <= 29
每类视频:group by tag
转发量:SUM(if_retweet) as retweet_cut
转发率:SUM(if_retweet) /count(*) as retweet_rate
保留3位小数:round( retweet_rate,3) as retweet_rate_round
结果按转发率降序排序: order by retweet_rate_round desc
SELECT
b.tag,
sum(if_retweet) AS retweet_cut,
round(sum(if_retweet) / count(*), 3) AS retweet_rate
FROM
tb_user_video_log a
LEFT JOIN tb_video_info b ON a.video_id = b.video_id
WHERE
DATEDIFF(
DATE(
(
SELECT
MAX(start_time)
FROM
tb_user_video_log
)
),
DATE(a.start_time)
) <= 29
GROUP BY
b.tag
ORDER BY
retweet_rate DESC
问题:SQL159 每个创作者每月的涨粉率及截止当前的总粉丝量
SELECT
author,
date_format(start_time,'%Y-%m') month,
round(sum(case when if_follow=1 then 1
when if_follow=2 then -1
else 0 end)/count(author),3) fans_growth_rate,
sum(sum(case when if_follow=1 then 1
when if_follow=2 then -1
else 0 end)) over(partition by author order by date_format(start_time,'%Y-%m')) total_fans
FROM tb_user_video_log log
left join tb_video_info info on log.video_id=info.video_id
where year(start_time)=2021
group by author,month
order by author,total_fans
SQL 数据分组累加sum() over (partition by ... order by ...)
问题:SQL160 国庆期间每类视频点赞量和转发量
问题:统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。
题解:
1.2021年国庆头3天:
where DATE_FORMAT(start_time, "%Y-%m-%d") BETWEEN '2021-10-01' and '2021-10-03'
2.每类视频每天的近一周总点赞量:
SUM(sum(if_like)) over (PARTITION BY tag ORDER BY DATE_FORMAT(start_time, "%Y-%m-%d") rows 6 preceding) as sum_like_7day
3.每类视频每天的近一周内最大转发量:
MAX(sum(if_retweet)) over (PARTITION BY tag ORDER BY DATE_FORMAT(start_time, "%Y-%m-%d") rows 6 preceding) as max_retweet_7day
4.每类视频每天
GROUP BY tag, dt
WITH t1 AS (
SELECT
tag,
DATE_FORMAT(start_time, '%Y-%m-%d') dt,
SUM(SUM(if_like))OVER(PARTITION BY tag ORDER BY DATE_FORMAT(start_time,'%Y-%m-%d') rows 6 preceding) AS sum_like_cnt_7d ,
MAX(SUM(if_retweet))OVER(PARTITION BY tag ORDER BY DATE_FORMAT(start_time,'%Y-%m-%d') rows 6 preceding) AS max_retweet_cnt_7d
FROM tb_user_video_log JOIN tb_video_info USING(video_id)
WHERE DATEDIFF('2021-10-03', DATE_FORMAT(start_time, '%Y-%m-%d'))<9
GROUP BY dt, tag
)
SELECT * FROM t1 WHERE dt BETWEEN '2021-10-01' AND '2021-10-03' ORDER BY tag DESC,dt ASC;