hive之grouping sets , grouping_id, cube,rollup

最新推荐文章于 2022-12-12 18:39:51 发布

原创最新推荐文章于 2022-12-12 18:39:51 发布 · 228 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#hive #数据库

hive 专栏收录该内容

10 篇文章

订阅专栏

本文详细解析了Hive中的groupingsets、grouping_id、cube和rollup四个分组函数的区别与用法，通过创建表格、插入数据并执行查询语句，展示了不同函数在数据聚合上的特点和应用场景。

hive中grouping sets , grouping_id, cube,rollup的区别

grouping sets

–建表

create table tmp.garret_test(
name string,
age int,
score int,
subject string
)

–插入数据

insert into table tmp.garret_test values('lina',18,65,'语文')
insert into table tmp.garret_test values('lina',18,78,'数学')
insert into table tmp.garret_test values('Sheldon',17,60,'语文')
insert into table tmp.garret_test values('Sheldon',17,90,'数学')
insert into table tmp.garret_test values('Andy',17,89,'数学')
insert into table tmp.garret_test values('Andy',17,88,'语文')

在一个GROUP BY查询中，根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL
GROUPING__ID，表示结果属于哪一个分组集合。

select
  name,
  age,
  count(distinct score) as cnt,
  GROUPING__ID
from tmp.garret_test
group by name,age
grouping sets (name,age)
order by GROUPING__ID;
--结果为
lina       null    2      1
Sheldon    null    2      1
Andy       null    2      1
null	   17      4	  2
null       18      4      2

–等价于

SELECT name,NULL,COUNT(DISTINCT score) AS total_score,1 AS GROUPING__ID FROM tmp.garret_test GROUP BY name
UNION ALL
SELECT NULL,age,COUNT(DISTINCT score) AS total_score,2 AS GROUPING__ID FROM tmp.garret_test GROUP BY age

cube 根据group By的所有的维度进行汇总

SELECT  name, age,
COUNT(DISTINCT score) AS uv,
GROUPING__ID
FROM tmp.garret_test
GROUP BY name,age
WITH CUBE
ORDER BY GROUPING__ID;

–结果

null 	   null    6      0
lina       null    2      1
Sheldon    null    2      1
Andy       null    2      1
null	   17      4	  2
null       18      4      2
lina       18      2      1
Sheldon    17      2      1
Andy       17      2      1

–等价于

SELECT NULL,NULL,COUNT(DISTINCT score) AS uv,0 AS GROUPING__ID FROM tmp.garret_test
UNION ALL
SELECT name,NULL,COUNT(DISTINCT score) AS uv,1 AS GROUPING__ID FROM tmp.garret_test GROUP BY name
UNION ALL
SELECT NULL,age,COUNT(DISTINCT score) AS uv,2 AS GROUPING__ID FROM tmp.garret_test GROUP BY age
UNION ALL
SELECT name,age,COUNT(DISTINCT score) AS uv,3 AS GROUPING__ID FROM tmp.garret_test GROUP BY name,age

rollup 是CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合

SELECT  name, age, COUNT(DISTINCT score) AS uv, GROUPING__ID 
FROM tmp.garret_test
GROUP BY name,age WITH ROLLUP  ORDER BY GROUPING__ID;

null 	   null    6      0
lina       null    2      1
Sheldon    null    2      1
Andy       null    2      1
lina       18      2      1
Sheldon    17      2      1
Andy       17      2      1