题型1:最大同时在线人数
SQL163
思路:
1,进入时刻记为1,出来时间记为-1.连接起来
2,SUM窗口函数,按文章id维度,统计按时间戳升序的观看人数变化情况
3,最外层SELECT按artical_id聚合,通过MAX(cnt)取出瞬时观看最大值max_uv,并排序。
同时在线人数的题型都是这个步骤!SQL189同题型
题型2:union的使用
不是连接,是直接上下摞起来,用union
SQL 135
select u_i.uid as uid,
count(distinct act_month) as act_month_total,
count(distinct case
when year(act_time) = 2021
then act_day
end) as act_days_2021,
count(distinct case
when year(act_time) = 2021
and tag = 'exam'
then act_day
end) as act_days_2021_exam,
count(distinct case
when year(act_time) = 2021
and tag = 'question'
then act_day
end) as act_days_2021_question
from user_info u_i
left join (select uid,
start_time as act_time,
date_format(start_time, '%Y%m') as act_month,
date_format(start_time, '%Y%m%d') as act_day,
'exam' as tag
from exam_record
union all
select uid,
submit_time as act_time,
date_format(submit_time, '%Y%m') as act_month,
date_format(submit_time, '%Y%m%d') as act_day,
'question' as tag
from practice_record
) exam_and_practice
on exam_and_practice.uid = u_i.uid
where u_i.level >= 6
group by uid
order by act_month_total desc, act_days_2021 desc
1,试卷数据与题目数据用union all 连接起来,且用tag标注是来自哪张表
2,count(distinct case when 条件 then 取的结论数据列 end)
题型3:如何找到每个月的新增用户
给原表加一个标记新用户的tag列,为1 则是新用户,0为旧用户
IF(start_time=MIN(start_time)OVER(PARTITION BY uid),1,0) AS tag
这样sum(tag) 能直接计算得新增用户数
例题:
SQL144月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数
select *,max(month_add_uv)over( order by ym range between unbounded preceding and current row),
sum(month_add_uv)over( order by ym range between unbounded preceding and current row)
from
(select ym,count(distinct uid) as mau,sum(tag) as month_add_uv
from
(select *,date_format(start_time,'%Y%m')as ym,
if(start_time=min(start_time)over(partition by uid),1,0) as tag
from exam_record ) a
group by ym ) b
月活用户数 — count(distinct uid)
新增用户数 —sum(tag)新增列求和
截止当月的单月最大新增用户数 — max(新增用户数)窗口函数range between unbounded preceding and current row
截止当月的累积用户数 — sum(新增用户数)窗口函数range between unbounded preceding and current row
4,Exists的使用。如果满足条件输出数据1,如果不满足条件则输出另一部分数据2
SQL149
with t as (
select uid,ui.level,
count(start_time) - count(submit_time) as incomplete_cnt,
round(ifnull(1-count(submit_time)/count(start_time),0),3) as incomplete_rate,
count(start_time) as total_cnt
from user_info ui left join exam_record using(uid)
group by uid)
select uid,incomplete_cnt,incomplete_rate
from t
where exists(select uid from t where t.level = 0 and incomplete_cnt>2)
and level =0
union all
select uid,incomplete_cnt,incomplete_rate
from t
where not exists(select uid from t where t.level = 0 and incomplete_cnt>2)
and total_cnt > 0
order by incomplete_rate;
with t as () —把要的数据字段列表示出来备用
select 数据1 where exists(条件)
union all
select 数据2 where not exists(条件)
5,取排前几次的数据的方法:
1)排序后limit n
2)row_number()over()窗口函数排序成cn,再取cn <= n
6,select from (select **** ) a
from (新建表中) a 要另取名字啊
7,日期函数
datediff(时间1,时间2):计算两个日期之间间隔的天数,单位为日
timestampdiff(时间单位,开始时间,结束时间):两个日期的时间差,返回的时间差形式由时间单位决定(日,周,月,年)
date_add(日期,interval n 时间单位) :返回加上n个时间单位后的日期
date_sub(日期,interval n 时间单位 ):返回减去n个时间单位后的日期
date_format(时间,‘%Y-%m-%d’):强制转换时间为所需要的格式
8,SQL138连续两次作答试卷的最大时间窗
with t2 as (
select uid,count(start_time) total, -- 用户2021年作答的次数
datediff(max(start_time),min(start_time))+1 diff_time, -- 头尾作答时间窗
max(datediff(next_time,start_time))+1 days_window -- 最大间隔天数
from (select uid,start_time,lead(start_time,1)over(partition by uid order by start_time) as next_time
from exam_record
where year(start_time) = 2021
) t1
group by uid
)
select uid,days_window,round(total*days_window/diff_time,2) avg_exam_cnt
from t2
where diff_time >1
order by days_window desc,avg_exam_cnt desc ;
1)要求同个用户的前后时间间隔,用到lead(start_time,1)over(partition by uid order by start_time) as next_time ,然后datediff()做差
2)用with t as ()做备用,避免多次嵌套
9,排序函数
rank() over() 1 2 2 4 4 6 (计数排名,跳过相同的几个,eg.没有3没有5)
row_number() over() 1 2 3 4 5 6 (赋予唯一排名)
dense_rank() over() 1 2 2 3 3 4 (不跳过排名,可以理解为对类别进行计数)
percent_rank() over() 按照数字所在的位置进行百分位分段
ntile(n)over() 将数字按照大小平均分成n段
lead(字段名,n)over()把字段数据向前移n个单元格
lag(字段名,n)over()把字段数据向后移n个单元格
SQL140 、SQL141、SQL164(计算次日留存率)
10,ifnull()函数
IFNULL(expression, alt_value)
如果第一个参数的表达式 expression 为 NULL,则返回第二个参数的备用值。如果不为 NULL 则返回第一个参数的值。