hive之grouping sets , grouping_id, cube,rollup

本文详细解析了Hive中的groupingsets、grouping_id、cube和rollup四个分组函数的区别与用法,通过创建表格、插入数据并执行查询语句,展示了不同函数在数据聚合上的特点和应用场景。

hive中grouping sets , grouping_id, cube,rollup的区别

grouping sets

–建表

create table tmp.garret_test(
name string,
age int,
score int,
subject string
)

–插入数据

insert into table tmp.garret_test values('lina',18,65,'语文')
insert into table tmp.garret_test values('lina',18,78,'数学')
insert into table tmp.garret_test values('Sheldon',17,60,'语文')
insert into table tmp.garret_test values('Sheldon',17,90,'数学')
insert into table tmp.garret_test values('Andy',17,89,'数学')
insert into table tmp.garret_test values('Andy',17,88,'语文')

在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL
GROUPING__ID,表示结果属于哪一个分组集合。

select
  name,
  age,
  count(distinct score) as cnt,
  GROUPING__ID
from tmp.garret_test
group by name,age
grouping sets (name,age)
order by GROUPING__ID;
--结果为
lina       null    2      1
Sheldon    null    2      1
Andy       null    2      1
null	   17      4	  2
null       18      4      2
                 

–等价于

SELECT name,NULL,COUNT(DISTINCT score) AS total_score,1 AS GROUPING__ID FROM tmp.garret_test GROUP BY name
UNION ALL
SELECT NULL,age,COUNT(DISTINCT score) AS total_score,2 AS GROUPING__ID FROM tmp.garret_test GROUP BY age
cube 根据group By的所有的维度进行汇总
SELECT  name, age,
COUNT(DISTINCT score) AS uv,
GROUPING__ID
FROM tmp.garret_test
GROUP BY name,age
WITH CUBE
ORDER BY GROUPING__ID;

–结果

null 	   null    6      0
lina       null    2      1
Sheldon    null    2      1
Andy       null    2      1
null	   17      4	  2
null       18      4      2
lina       18      2      1
Sheldon    17      2      1
Andy       17      2      1

–等价于

SELECT NULL,NULL,COUNT(DISTINCT score) AS uv,0 AS GROUPING__ID FROM tmp.garret_test
UNION ALL
SELECT name,NULL,COUNT(DISTINCT score) AS uv,1 AS GROUPING__ID FROM tmp.garret_test GROUP BY name
UNION ALL
SELECT NULL,age,COUNT(DISTINCT score) AS uv,2 AS GROUPING__ID FROM tmp.garret_test GROUP BY age
UNION ALL
SELECT name,age,COUNT(DISTINCT score) AS uv,3 AS GROUPING__ID FROM tmp.garret_test GROUP BY name,age
rollup 是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合
SELECT  name, age, COUNT(DISTINCT score) AS uv, GROUPING__ID 
FROM tmp.garret_test
GROUP BY name,age WITH ROLLUP  ORDER BY GROUPING__ID;
null 	   null    6      0
lina       null    2      1
Sheldon    null    2      1
Andy       null    2      1
lina       18      2      1
Sheldon    17      2      1
Andy       17      2      1
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值