题目复盘——多表查询
题源牛客网——记录一下自己的学习
SQL20 月均完成试卷数不小于3的用户爱作答的类别
问题分析
(1)月均完成试卷数————总完成次数/完成时所在月份数量(有争议)>=3;
(2)完成时所在月份数————count(distinct date_format(submit_time,’%Y%m’));
(3)总完成次数————count(submit_time);
(4)作答次数————count(start_time);
(5)爱作答类别————group by tag;left join examination_info;
(6)排序————order by;
答案重写
SELECT tag,COUNT(start_time) tag_cnt FROM exam_record e_r
LEFT JOIN examination_info e_i ON e_r.exam_id=e_i.exam_id
WHERE uid in(
SELECT uid FROM exam_record GROUP BY uid
HAVING COUNT(submit_time)/COUNT(DISTINCT DATE_FORMAT(submit_time,'%Y%m'))>=3)
GROUP BY tag
ORDER BY tag_cnt DESC;
细节问题
这个题对于其中条件的解读有点争议,尤其是那个月均,还有当月之类的含糊词语,容易理解错。
SQL21 试卷发布当天作答人数和平均分
问题分析
(1)5级以上用户————where level >5;
(2)试卷发放当天作答人数————count(distinct uid);date_format(release_time,’%Y%m%d’)=date_format(submit_time,’%Y%m%d’);
(3)平均分————sum()/count();avg();
(4)保留一位小数————round(,1);
答案重写
SELECT exam_id,count(distinct e_r.uid) uv,ROUND(avg(score),1) avg_score FROM exam_record e_r
LEFT JOIN user_info u_i ON u_i.uid=e_r.uid
LEFT JOIN examination_info e_i ON e_i.exam_id=e_r.exam_id
WHERE level>5 AND tag='SQL' AND DATE_FORMAT(e_r.release_time,'%Y%m%d')=DATE_FORMAT(submit_time,'%Y%m%d')
GROUP BY exam_id
ORDER BY uv DESC,avg_score ASC;
SQL22 作答试卷得分大于80的人的用户等级分布
问题分析
(1)SQL类别试卷得分大于80———tag=‘SQL’,score>80;
(2)等级分布————group by level;
(3)每个等级人数————count(distinct uid);
(4)数量降序————order by level_cnt desc;
答案重写
(1)多表联结法
SELECT level,COUNT(distinct e_r.uid) level_cnt FROM exam_record e_r
JOIN examination_info e_i ON e_i.exam_id=e_r.exam_id
JOIN user_info u_i ON u_i.uid=e_r.uid
WHERE score>80 AND tag='SQL'
GROUP BY level
ORDER BY level_cnt DESC,level DESC;
(2)嵌套子查询法
SELECT level,count(distinct uid) level_cnt
FROM user_info u_i
WHERE uid in(
SELECT uid FROM exam_record WHERE exam_id in
(SELECT exam_id FROM examination_info WHERE tag='SQL')
AND score>80)
GROUP BY level
ORDER BY level_cnt DESC,level DESC;
相比连接查询使用in运算符思路更清晰
1.找出SQL试卷的exam_id
2.根据exam_id找到做了SQL试卷且得分大于80的uid
3.根据uid找到对应的用户,使用相关聚合函数完成查询
牛客92009084号
细节问题
一定要弄清楚每个条件的具体定义究竟是什么。
SQL23 每个题目和每份试卷被作答的人数和次数
问题分析
(1)试卷和题目分别位于两张表,考虑分别查询出符合条件的结果用UNION ALL连结起来,UNION ALL不会去除重复项;
(2)被作答人数————COUNT(DISTINCT uid)
(3)被作答次数————COUNT(start_time)
(4)GROUP BY
(5)降序————ORDER BY DESC
答案重写
SELECT exam_id tid,COUNT(DISTINCT uid) uv,COUNT(start_time) pv
FROM exam_record e_r
GROUP BY exam_id
UNION ALL
select question_id tid,COUNT(DISTINCT uid) uv,COUNT(submit_time) pv
FROM practice_record
GROUP BY question_id
ORDER BY LEFT(tid,1) desc,UV DESC,PV DESC;
细节问题
(1)union可以使用任何selcet语句,但order by子句只能在最后一次使用。
SQL24 分别满足两个活动的人
问题分析
(1)注意到示例里面,同时满足两个活动的用户作为两条记录,而不是在activity列一条记录里显示两个活动,这就降低了难度,可以分别查询出符合活动的用户,再用UNION ALL即可;
(2)activity1每次试卷得分都能到85————score>=85
(3)activity2至少有一次用了一半时间就完成高难度试卷且分数大于80————TIMESTAMPDIFF(second,start_time,submit_time)<=duration*60/2 and score>80;LEFT JOIN e_i ON …difficulty=‘hard’;
(4)按用户id排序输出————order by uid;
答案重写
select uid,'activity1' activity
FROM exam_record e_r
WHERE YEAR(submit_time)=2021
AND uid NOT IN(SELECT uid FROM exam_record WHERE score<85)
GROUP BY uid
UNION ALL
SELECT uid,'activity2' activity FROM exam_record e_r
LEFT JOIN examination_info e_i ON e_i.exam_id=e_r.exam_id
WHERE difficulty='hard'
AND TIMESTAMPDIFF(second,start_time,submit_time)<=duration*60/2
AND score>80
AND YEAR(submit_time)=2021
GROUP BY uid
ORDER BY uid,RIGHT(activity,1);
细节问题
(1)填充文本为新的一列时,要加单引号‘activity1’不然会报错找不到这一列;
(2)TIMESTAMPDIFF后面可以用天(day)、小时(hour)、分钟(minute)、秒(second),datediff函数,返回值是相差的天数,不能定位到小时、分钟和秒;
(3)本题一定要用秒作为计算时间差的单位时长,首先所给时间都是到秒的,而且小于等于一般的时间意味着超过一秒都不算,如果单位是分钟就会筛出答题时长超过几秒但不足一分钟的用户,这样是不对的;
(4)不要忘记年份限制;
SQL25 满足条件的用户的试卷完成数和题目练习数
问题分析
(1)高难度SQL试卷得分平均值大于80——tag=‘SQL’ and difficulty=‘hard’;
(2)七级用户————left join user_info on…;level=7;
(3)请注意以上条件只是对用户也就是uid的限制,而要统计次数的试卷和题目是没有以上限制条件的!包括所有类型和难度!
(4)试卷总完成次数——count(distinct if(e_r.score is not null,e_r.exam_id,null)
(5)题目总练习次数——
(6)2021年————year()=2021;
(7)需要left join 两表,可以用子查询筛选uid;
(8)排序————ORDER BY;
答案重写
select n_e_r.uid uid,exam_cnt,if(n_e_r.uid=n_p_r.uid,question_cnt,0) question_cnt
from (select e_r.uid uid,count(distinct if(e_r.score is not null,e_r.exam_id,null)) exam_cnt
from exam_record e_r
where e_r.uid in
(select e_r.uid from exam_record e_r left join user_info u_i on u_i.uid=e_r.uid
left join examination_info e_i on e_i.exam_id=e_r.exam_id
where level>=7 and difficulty='hard' and tag='SQL' and year(e_r.submit_time)=2021
GROUP BY e_r.uid
HAVING avg(e_r.score)>80)
GROUP BY e_r.uid) n_e_r
left JOIN (select p_r.uid uid,count(distinct if(score is not null,submit_time,null)) question_cnt
from practice_record p_r where year(p_r.submit_time)=2021
group by p_r.uid) n_p_r
ON n_p_r.uid=n_e_r.uid
order by exam_cnt ASC,question_cnt DESC;
细节问题
1、为什么要用left join? 因为一个人可能做了试卷 exam,却没有做习题practice,如果 join exam_record和
practice_record on uid,那么由于这个人的 uid 在 practice_record
中不存在,uid就会被排除,不符合题意。
2、为什么明明题目问的是总完成次数,却要 count distinct? 因为 left join
导致出现了重复行,也就是笛卡尔积,所以,本题目的distinct不是对 exam_id或者
question_id去重,而是为了解决多出来的重复行的问题。
id:大力不失眠
关于所取数据的定义评论区有很多讨论,感觉题目设置数据也有点问题,这个id的评论再仔细看一下。
SQL26 每个6/7级用户活跃情况
问题分析
(1)总活跃月份和活跃天数是包含两个表的活跃度,但是!如果分别count再加起来就错了!因为要考虑到做题和做试卷可能在同一天,这样算出来次数会多!
(2)可以用UNION ALL,从两个表中select出字段重命名并预处理一下,将两个表合并,再从合并后的大表中select所需字段。
(3)总活跃月份数没有年份限制,而其余三者有年份限制,那么此时就不能在where后用,可以用case when或者if。
答案重写
select user_info.uid uid,
count(distinct act_month) as act_month_total,
count(distinct case when year(act_time)='2021'then act_day end) as act_days_2021,
count(distinct case when year(act_time)='2021' and tag='exam' then act_day end) as act_days_2021_exam,
count(distinct case when year(act_time)='2021' and tag='question'then act_day end) as act_days_2021_question
from (SELECT uid,exam_id as ans_id,start_time as act_time,date_format(start_time,'%Y%m') as act_month, date_format(start_time,'%Y%m%d') as act_day,'exam' as tag from exam_record
UNION ALL
select uid,question_id as ans_id,submit_time as act_time,date_format(submit_time,'%Y%m') as act_month, date_format(submit_time,'%Y%m%d') as act_day,'question' as tag from practice_record ) total
right join user_info on total.uid=user_info.uid
where user_info.level in (6,7)
group by user_info.uid
order by act_month_total desc,act_days_2021 desc
如果用if部分:
SELECT ui.uid,
COUNT(DISTINCT DATE_FORMAT(act_time,'%Y%m')) as act_month_total,
COUNT(DISTINCT IF(YEAR(act_time)=2021,DATE(act_time),null)) as act_days_2021,
COUNT(DISTINCT IF(YEAR(act_time)=2021 and tag='exam',act_time,null)) as act_days_2021_exam,
COUNT(DISTINCT IF(YEAR(act_time)=2021 and tag='question',act_time,null)) as act_days_2021_question
细节问题
(1)then后面的字段,只能直接获得,不能通过间接获得,比如上面的一个相加的值,要想那些写,只能在包一层,才能获得使用。
(2)一上来直接踩两个相加的坑了