1.1.5你知道的排名函数有哪些?说一说它们之间的区别?
排名函数:
row_number() over() : 排名函数,没有并列名次,名次连续,如:1,2,3. 适合于生成主键或者不并列排名,
rank() over() : 排名函数,有并列名次,名次不连续。如:1,1,3. dense_rank() over() : 排名函数,有并列名次,名次连续。如:1,1,2.
区别:
1、(共同点)三者都可以用于分组排名,名次都是递增。
2、(不同点)row_number()没有并列名次,名次连续,如:1,2,3; rank()有并列名次,名次不连续, 如:1,1,3。dense_rank()有并列名次,名次连续。如:1,1,2。
1.6 score-nvl 列做减法
1.1.6编写sql语句实现每班前三名,分数一样并列,同时求出前三名按名次排序的一次的分差:
数据:
stu表
Stu_no class score
1 1901 90
2 1901 90
3 1901 83
4 1901 60
5 1902 66
6 1902 23
7 1902 99
8 1902 67
9 1902 87
结果数据:
班级 stu_no score rn rn1 rn_diff
1901 1 90 1 1 90
1901 2 90 2 1 0
1901 3 83 3 1 -7
1902 7 99 1 1 99
1902 9 87 2 2 -12
1902 8 67 3 3 -20
select *,
score-nvl(lag(score)over(partition by class order by l2),0) l3
from
(
select * from
(
select *,
dense_rank()over(partition by class order by score desc) l1,
row_number()over(partition by class order by score desc) l2 from stu
) b where b.l2 < 4
) c;
1.1.7每个店铺的当月销售额和累计到当月的总销售额
数据:
店铺,月份,金额
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额?
参考答案
看sale表
select t.id,t.month,t.sum,
max(t.sum) over(partition by t.id order by t.month) max,
sum(t.sum) over(partition by t.id order by t.month) summ
from
(select id,month,sum(money) sum from sale group by id,month) t
1.1.9订单及订单类型行列互换
t1表:
order_id order_type order_time
111 N 10:00
111 A 10:05
111 B 10:10
是用hql获取结果如下:
order_id order_type_1 order_type_2 order_time_1 order_time_2
111 N A 10:00 10:05
111 A B 10:05 10:10
参考答案
select * from order_type where type<>‘B’ union all
select * from order_type where type<>‘N’
窗口函数 就是你想显示谁 就显示谁 可以独自开一列进行显示 而是还有特性
select * from(
select
id,
type,
lead(type) over (partition by id order by time) a,
time,
lead(time) over (partition by id order by time) b
from order_table) c where c.type<>‘B’
1.1.10某APP每天访问数据存放在表access_log里面,包含日期字段ds,用户类型字段user_type,用户账号user_id,用户访问时间log_time,请使用hive的hql语句实现如下需求:
(1)、每天整体的访问UV、PV?
(2)、每天每个类型的访问UV、PV?
(3)、每天每个类型中最早访问时间和最晚访问时间?
(4)、每天每个类型中访问次数最高的10个用户?
(1)、每天整体的访问UV、PV?
select
count(1) as pv, count(distinct user_id) as uv from access_log
group by ds
;
(2)、每天每个类型的访问UV、PV? select
user_type, count(1) as pv,
count(distinct user_id) as uv from access_log
group by user_type,ds
;
(3)、每天每个类型中最早访问时间和最晚访问时间? select
first_value(log_time) over(distribute by user_type,ds sort by log_time), last_value(log_time) over(distribute by user_type,ds sort by log_time) from access_log
;
(4)、每天每个类型中访问次数最高的10个用户?
select * from (select user_type, ds,
row_number() over(distribute by user_type,ds sort by log_time) rm from access_log) tmp
where rm < 10
;
1.1.11每个用户连续登陆的最大天数?
数 据 : login表uid,date
1,2019-08-01
1,2019-08-02
1,2019-08-03
2,2019-08-01
2,2019-08-02
3,2019-08-01
3,2019-08-03
4,2019-07-28
4,2019-07-29
4,2019-08-01
4,2019-08-02
4,2019-08-03
结果如下: uid cnt_days 1 3
2 2
3 1
4 3
create table lg_cnt( uid int,
dt string
)
row format delimited fields terminated by ‘,’
;
load data local inpath ‘/hivedata/lg_cnt.txt’ overwrite into table lg_cnt;
select uid, max(cn) from (select uid,
count(*) cn from (select
uid,
date_sub(dt,row_number() over(distribute by uid sort by dt)) dt
from lg_cnt) t1
group by uid,dt) t2 group by uid