Hive中with cube、with rollup、grouping sets用法

表结构

CREATE TABLE test (f1 string,
                   f2 string,
                   f3 string,
                   cnt int) ROW FORMAT delimited FIELDS TERMINATED BY '\t' stored AS textfile;
LOAD DATA LOCAL inpath '/data/logs/suiyingli/tmp/test.data' overwrite INTO TABLE test;

原始数据

A       A       B      1
B       B       A      1
A       A       A      2

with cube查询语句

SELECT f1,
       f2,
       f3,
       sum(cnt),
       GROUPING__ID,
       rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
FROM test
GROUP BY f1,
         f2,
         f3 WITH CUBE;

with cube结果范例


rollup查询语句
SELECT f1,
       f2,
       f3,
       sum(cnt),
       GROUPING__ID,
       rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
FROM test
GROUP BY f1,
         f2,
         f3 WITH ROLLUP;

rollup结果范例


grouping sets查询语句
SELECT f1,
       f2,
       f3,
       sum(cnt),
       GROUPING__ID,
       rpad(reverse(bin(cast(GROUPING__ID AS bigint))),3,'0')
FROM test
GROUP BY f1,
         f2,
         f3
GROUPING sets((f1),(f1,f2))
grouping sets结果范例

总结

cube的分组组合最全,是各个维度值的笛卡尔(包含null)组合,
rollup的各维度组合应满足,前一维度为null后一位维度必须为null,前一维度取非null时,下一维度随意,
grouping sets则为自定义维度,根据需要分组即可。
ps:通过grouping sets的使用可以简化SQL,比group by单维度进行union性能更好。
### Hive SQL Grouping Sets Usage and Examples In Hive, `GROUPING SETS` allow aggregation at multiple levels simultaneously. This feature extends beyond simple GROUP BY operations to provide more flexible data summarization capabilities. For instance, consider a table named `sales_data`, which contains sales information across different regions and product categories: ```sql CREATE TABLE IF NOT EXISTS sales_data ( region STRING, category STRING, amount DOUBLE ); ``` To calculate total sales per region as well as overall totals without repeating rows manually, one can use `GROUPING SETS`. Here’s an example query demonstrating this functionality[^1]: ```sql SELECT COALESCE(region, 'Total') AS region, SUM(amount) AS total_sales FROM sales_data GROUP BY GROUPING SETS ((region), ()) ORDER BY region; ``` This will produce results showing both individual regional sums along with an additional row representing the grand total of all sales amounts. Another common scenario involves analyzing sales not only by each combination of region and category but also obtaining subtotals for just the regions or categories alone. The following code snippet illustrates how such multi-level aggregations could be achieved using `GROUPING SETS`[^2]: ```sql SELECT COALESCE(region, 'All Regions') AS region, COALESCE(category, 'All Categories') AS category, SUM(amount) AS total_sales FROM sales_data GROUP BY GROUPING SETS ((region, category), (region), (category), ()) ORDER BY region NULLS LAST, category NULLS LAST; ``` The above script generates detailed breakdowns alongside higher-level summaries within a single result set efficiently. Additionally, when working with complex queries involving multiple dimensions like time periods combined with geographical areas or other attributes, leveraging `CUBE` or `ROLLUP` operators—which internally utilize `GROUPING SETS`—can simplify syntax while enhancing readability significantly[^3].
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值