Grouping_ID函数
当我们没有统计某一列时,它的值显示为null,这可能与列本身就有null值冲突,这就需要一种方法区分是没有统计还是值本来就是null。(写一个排列组合的算法,就马上理解了,grouping_id其实就是所统计各列二进制和)
| Column1 (key) |
Column2 (value) |
|---|---|
| 1 | NULL |
|
1 |
1 |
|
2 |
2 |
|
3 |
3 |
|
3 |
NULL |
|
4 |
5 |
hql统计:
SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
统计结果如下:
| NULL | NULL | 0 00 | 6 |
| 1 | NULL | 1 10 | 2 |
| 1 | NULL | 3 11 | 1 |
| 1 | 1 | 3 11 | 1 |
| 2 | NULL | 1 10 | 1 |
| 2 | 2 | 3 11 | 1 |
| 3 | NULL | 1 10 | 2 |
| 3 | NULL | 3 11 | 1 |
| 3 | 3 | 3 11 | 1 |
| 4 | NULL | 1 10 | 1 |
| 4 | 5 | 3 11 | 1 |
如果列中没有有null值
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
(case when fact_1_id is null then 1 else 0 end) as f1g,
(case when fact_2_id is null then 1 else 0 end) as f2
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
如果列中本来就有null值
SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value,
(case when (CAST (GROUPING__ID AS INT) & 1) = 0 then 1 else 0 end) as f1g,
(case when (CAST (GROUPING__ID AS INT) & 2) = 0 then 1 else 0 end) as f2g
FROM dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
如下
(case when (CAST (GROUPING__ID AS INT) & 1) = 1 then '1 ' else 0 end),
(case when (CAST (GROUPING__ID AS INT) & 2) = 2 then '2' else 0 end) ,
(case when (CAST (GROUPING__ID AS INT) & 4) = 4 then '3' else 0 end) ,
第一位如果和1与为1,证明第一位存在同时包括本来存在的null值,
用来判断每一个字段知否为聚合的字段中,null值本来的null还是grouping set的null值。
参考
http://stackoverflow.com/questions/29577887/grouping-in-hive
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
本文详细介绍了 Grouping_ID 函数在处理包含 null 值的数据集时的应用,通过具体的 SQL 示例展示了如何区分自然 null 值与聚合操作产生的 null 值,帮助读者更好地理解和使用该函数。
248

被折叠的 条评论
为什么被折叠?



