Hive笔试题实战
短视频
题目一:计算各个视频的平均完播率
有用户-视频互动表tb_user_video_log:
id |
uid |
video_id |
start_time |
end_time |
if_follow |
if_like |
if_retweet |
comment_id |
1 |
101 |
2001 |
2021-10-01 10:00:00 |
2021-10-01 10:00:30 |
0 |
1 |
1 |
NULL |
2 |
102 |
2001 |
2021-10-01 10:00:00 |
2021-10-01 10:00:24 |
0 |
0 |
1 |
NULL |
3 |
103 |
2001 |
2021-10-01 11:00:00 |
2021-10-01 11:00:34 |
0 |
1 |
0 |
1732526 |
4 |
101 |
2002 |
2021-09-01 10:00:00 |
2021-9-01 10:00:42 |
1 |
0 |
1 |
NULL |
5 |
102 |
2002 |
2021-10-01 11:00:00 |
2021-10-01 10:00:30 |
1 |
0 |
1 |
NULL |
uid-用户ID,video_id-视频ID,start_time-开始观看时间,end_time-结束观看时间,if_follow-是否关注,if_like-是否点赞,if_retweet-是否转发,comment_id-评论ID。
有短视频信息表tb_video_info:
id |
video_id |
author |
tag |
duration |
release_time |
1 |
2001 |
901 |
影视 |
30 |
2021-01-01 07:00:00 |
2 |
2002 |
901 |
美食 |
60 |
2021-01-01 07:00:00 |
3 |
2003 |
902 |
旅游 |
90 |
2021-01-01 07:00:00 |
video_id-视频ID,author-创作者ID,tag-类别标签,duration-视频时长(秒),release_time-发布时间。
问题:计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序。输出结果如下:
video_id |
avg_comp_play_rate |
2001 |
0.667 |
2002 |
0.000 |
注:视频完播率是指完成播放次数占总播放次数的比例。简单起见,结束观看时间与开始播放时间的差≥视频时长时,视为完成播放。
-- 建立用户-视频互动表
drop table if exists tb_user_video_log;
create table tb_user_video_log (
id int comment '自增ID',
uid int comment '用户ID',
video_id int comment '视频ID',
start_time string COMMENT '开始观看时间',
end_time string COMMENT '结束观看时间',
if_follow int comment '是否关注',
if_like int comment '是否点赞',
if_retweet int comment '是否转发',
comment_id int comment '评论ID'
) comment '用户-视频互动表'
row format delimited fields terminated by ',';
-- 建立短视频信息表
drop table if exists tb_video_info;
create table tb_video_info (
id int comment '自增ID',
video_id int comment '视频ID',
author int comment '创作者ID',
tag string comment '类别标签',
duration int comment '视频时长(秒数)',
release_time string comment '发布时间'
) comment '短视频信息表'
row format delimited fields terminated by ',';
-- 插入数据
insert into tb_user_video_log
values (1, 101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
(2, 102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:24', 0, 0, 1, null),
(3, 103, 2001, '2021-10-01 11:00:00', '2021-10-01 11:00:34', 0, 1, 0, 1732526),
(4, 101, 2002, '2021-09-01 10:00:00', '2021-09-01 10:00:42', 1, 0, 1, null),
(5, 102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null);
insert into tb_video_info
values (1, 2001, 901, '影视', 30, '2021-01-01 7:00:00'),
(2, 2002, 901, '美食', 60, '2021-01-01 7:00:00'),
(3, 2003, 902, '旅游', 90, '2021-01-01 7:00:00');
参考答案:
-- 第一步:找出2021年有过播放的视频
select * from tb_user_video_log where year(start_time) = 2021;
-- 第二步:计算(每一个视频的)完播次数。完播:结束时间-起始时间>=视频时长
select a.video_id as video_id,
sum(if(unix_timestamp(a.end_time) - unix_timestamp(a.start_time) >= b.duration, 1, 0))
from (
select * from tb_user_video_log where year(start_time) = 2021
) a left join tb_video_info b on a.video_id = b.video_id
group by a.video_id;
-- 第三步:计算完播率。完播次数/总的播放次数
select a.video_id as video_id,
sum(if(unix_timestamp(a.end_time) - unix_timestamp(a.start_time) >= b.duration, 1, 0)) / count(*)
from (
select * from tb_user_video_log where year(start_time) = 2021
) a left join tb_video_info b on a.video_id = b.video_id
group by a.video_id;
-- 第四步:保留三位小数,还需要降序排序
select a.video_id as video_id,
round(sum(if(unix_timestamp(a.end_time) - unix_timestamp(a.start_time) >= b.duration, 1, 0)) / count(*), 3) as avg_comp_play_rate
from (
select * from tb_user_video_log where year(start_time) = 2021
) a left join tb_video_info b on a.video_id = b.video_id
group by a.video_id
order by avg_comp_play_rate desc;
题目二:平均播放进度大于60%的视频类别
有用户-视频互动表tb_user_video_log:
id |
uid |
video_id |
start_time |
end_time |
if_follow |
if_like |
if_retweet |
comment_id |
1 |
101 |
2001 |
2021-10-01 10:00:00 |
2021-10-01 10:00:30 |
0 |
1 |
1 |
NULL |
2 |
102 |
2001 |
2021-10-01 10:00:00 |
2021-10-01 10:00:21 |
0 |
0 |
1 |
NULL |
3 |
103 |
2001 |
2021-10-01 11:00:50 |
2021-10-01 11:01:20 |
0 |
1 |
0 |
1732526 |
4 |
102 |
2002 |
2021-10-01 11:00:00 |
2021-10-01 11:00:30 |
1 |
0 |
1 |
NULL |
5 |
103 |
2002 |
2021-10-01 10:59:05 |
2021-10-01 11:00:05 |
1 |
0 |
1 |
NULL |
uid-用户ID,video_id-视频ID,start_time-开始观看时间,end_time-结束观看时间,if_follow-是否关注,if_like-是否点赞,if_retweet-是否转发,comment_id-评论ID。
有短视频信息表tb_video_info:
id |
video_id |
author |
tag |
duration |
release_time |
1 |
2001 |
901 |
影视 |
30 |
2021-01-01 07:00:00 |
2 |
2002 |
901 |
美食 |
60 |
2021-01-01 07:00:00 |
3 |
2003 |
902 |
旅游 |
90 |
2021-01-01 07:00:00 |
video_id-视频ID,author-创作者ID,tag-类别标签,duration-视频时长(秒),release_time-发布时间。
问题:计算各类视频的平均播放进度,将进度大于60%的类别输出(结果保留两位小数,并按播放进度倒序排序)。示例数据的输出结果如下:
tag |
avg_play_progress |
影视 |
90.00% |
美食 |
75.00% |
注:播放进度=播放时长÷视频时长*100%,当播放时长大于视频时长时,播放进度均记为100%。