数据准备
create table scores
(
sid int,
score int,
cid int
)row format delimited
fields terminated by '\t';
insert into scores values
(1, 90, 1),
(1, 59, 2),
(1, 67, 3),
(2, 20, 1),
(2, 30, 2),
(2, 40, 3),
(3, 14, 1),
(3, 13, 2),
(3, 15, 3),
(4, 90, 1),
(4, 90, 2),
(4, 87, 3)
;
hive> select * from scores;
OK
1 90 1
1 59 2
1 67 3
2 20 1
2 30 2
2 40 3
3 14 1
3 13 2
3 15 3
4 90 1
4 90 2
4 87 3
Time taken: 0.131 seconds, Fetched: 12 row(s)
数据分析
SELECT t1.sid,avg_score,rk
FROM
(
SELECT sid,avg_score,
DENSE_RANK ()over(ORDER BY avg_score DESC) AS rk
FROM
(
SELECT sid,
avg(score) avg_score
FROM scores
GROUP BY sid
)a
)t1
INNER JOIN
(
SELECT sid
FROM scores
GROUP BY sid
HAVING sum(IF(score<60,1,0)) >=2
)t2
ON t1.sid = t2.sid;
思路总结
t2表比较妙!品一品,聚合和having可以简化到这个地步!
表a本来也想使用avg窗口函数,但是数据最后还得去重,不如在这一步先去重
参考:SQL 不及格课程数大于2的学生的平均成绩及其排名-HQL面试题47【拼多多】_sql 不及格人数超过两人-优快云博客