目录
0. 概述
hive中分组取topN,有3中类型的函数可以用:
- row_number() over (partition by col01 order by col02 desc )
- rank() over (partition by col01 order by col02 desc )
- dense_rank() over (partition by col01 order by col02 desc )
1. 准备数据和表结构
create table dws_db.employee (empid int ,deptid int ,salary decimal(10,2)); insert into dws_db.employee values(1,10,5500.00); insert into dws_db.employee values(2,10,4500.00); insert into dws_db.employee values(3,20,1900.00); insert into dws_db.employee values(4,20,4800.00); insert into dws_db.employee values(5,40,6500.00); insert into dws_db.employee values(6,40,14500.00); insert into dws_db.employee values(7,40,44500.00); insert into dws_db.employee values(8,50,6500.00); insert into dws_db.employee values(9,50,7500.00); |
2. 几种不同分组取TOPN情况说明
2.1 分组-组内排序
对deptid分组,组内根据salary排序
select *, row_number() over(partition by deptid order by salary) as rank from employee
结果:
值相同,先出现的排名靠前
2.2 分区-组内排序-取topN
对deptid分组,组内根据salary降序排序,取每组内的前 1 名。
select * from (select *, row_number() over(partition by deptid order by salary desc) as salary_rank from employee) t where salary_rank <= 1
2.3 不分组-只对某个字段排序
所有数据进行排名,不用分组,只用 partition by 1即可
select *, row_number() over(partition by 1 order by salary desc) as salary_rank from employee
2.4 rank() over() 的使用
rank() over(partition by deptid order by salary desc )
select *, rank() over(partition by deptid order by salary desc) as salary_rank from employee
值相同是,排名会并列,下一个移动N,N为值相同的个数。
结果:
2.5 dense_rank()的使用
select *, dense_rank() over(partition by deptid order by salary desc) as salary_rank from employee
3. 参考资料
https://my.oschina.net/u/2000675/blog/1604035
https://blog.youkuaiyun.com/wiborgite/article/details/80521593
https://blog.youkuaiyun.com/longshenlmj/article/details/50525385