改写一行提速group by cube查询五倍-优快云博客

很多数据库都存在group by cube效率低于等价多个非cube group by 语句union all的现象，以duckdb为例。
我们基于内存先建立一个一亿行的测试表，包含5个列，分别填充不大于2、100、30、60、5的整数，模拟实际工作中不同规模的分组列。

C:\d>duckdb130
DuckDB v1.3.0 (Ossivalis) 71c5c07cdd
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D .timer on
D create temp table ren as select
((random()*1000)::int%2+1)::int c1,
((random()*1000)::int%100+1)::int c2,
((random()*1000)::int%30+1)::int c3,
((random()*1000)::int%60+1)::int c4,
((random()*1000)::int%5+1)::int c5
from generate_series(1,100000000::int);
Run Time (s): real 10.711 user 10.250000 sys 0.406250

代码1：直接对原始表group by cube, 为精简输出，将结果存入另一个表，用时8秒

D create temp table rc1c2c3c4 as select c1,c2,c3,c4,sum(1)cnt from ren group by cube(c1,c2,c3,c4);
Run Time (s): real 8.233 user 117.093750 sys 1.484375

代码2：直接对原始表group by再union all, 将结果存入另一个表，用时3秒，注意未参与分组的列要用null填充

D create temp table rc1c2c3c4m as
select c1,c2,c3,c4,sum(1)cnt from ren group by c1,c2,c3,c4 union all
select c1,null,null,null,sum(1)cnt from ren group by c1 union all
select null,c2,null,null,sum(1)cnt from ren group by c2 union all
select null,null,c3,null,sum(1)cnt from ren group by c3 union all
select null,null,null,c4,sum(1)cnt from ren group by c4 union all
select c1,c2,null,null,sum(1)cnt from ren group by c1,c2 union all
select c1,null,c3,null,sum(1)cnt from ren group by c1,c3 union all
select c1,null,null,c4,sum(1)cnt from ren group by c1,c4 union all
select null,c2,c3,null,sum(1)cnt from ren group by c2,c3 union all
select null,c2,null,c4,sum(1)cnt from ren group by c2,c4 union all
select null,null,c3,c4,sum(1)cnt from ren group by c3,c4 union all
select c1,c2,c3,null,sum(1)cnt from ren group by c1,c2,c3 union all
select c1,c2,null,c4,sum(1)cnt from ren group by c1,c2,c4 union all
select c1,null,c3,c4,sum(1)cnt from ren group by c1,c3,c4 union all
select null,c2,c3,c4,sum(1)cnt from ren group by c2,c3,c4 union all
select null,null,null,null,sum(1)cnt from ren  ;
Run Time (s): real 3.230 user 42.203125 sys 0.515625

代码3：对原始表group by全部分组列后存入实体化的CTE，再对它group by再union all, 将结果存入另一个表，用时1.3秒，需要指出，这种等价改写有局限性，比如只能对计数和求和使用，而且不适用于有distinct的情况。

D create temp table rc1c2c3c4m2 as
with t0 as  materialized (select c1,c2,c3,c4,sum(1 )cnt from ren group by c1,c2,c3,c4 )
select c1,c2,c3,c4,cnt from t0 union all
select c1,null,null,null,sum(cnt)cnt from t0 group by c1 union all
select null,c2,null,null,sum(cnt)cnt from t0 group by c2 union all
select null,null,c3,null,sum(cnt)cnt from t0 group by c3 union all
select null,null,null,c4,sum(cnt)cnt from t0 group by c4 union all
select c1,c2,null,null,sum(cnt)cnt from t0 group by c1,c2 union all
select c1,null,c3,null,sum(cnt)cnt from t0 group by c1,c3 union all
select c1,null,null,c4,sum(cnt)cnt from t0 group by c1,c4 union all
select null,c2,c3,null,sum(cnt)cnt from t0 group by c2,c3 union all
select null,c2,null,c4,sum(cnt)cnt from t0 group by c2,c4 union all
select null,null,c3,c4,sum(cnt)cnt from t0 group by c3,c4 union all
select c1,c2,c3,null,sum(cnt)cnt from t0 group by c1,c2,c3 union all
select c1,c2,null,c4,sum(cnt)cnt from t0 group by c1,c2,c4 union all
select c1,null,c3,c4,sum(cnt)cnt from t0 group by c1,c3,c4 union all
select null,c2,c3,c4,sum(cnt)cnt from t0 group by c2,c3,c4 union all
select null,null,null,null,sum(cnt)cnt from t0  ;
Run Time (s): real 1.328 user 13.562500 sys 0.437500

以上两种改写的行数随分组列个数增加呈指数级增加，正确编写很费时费力，我为此还专门让DeepSeek帮我实现了一个将group by cube查询语句转为等价CTE的转换程序，现在看来也没有必要了，因为有下面第4种代码。
代码4：对原始表group by全部分组列后存入实体化的CTE，再对它group by cube, 将结果存入另一个表，用时1.5秒

D create temp table rc1c2c3c4m3 as
with t0 as  materialized (select c1,c2,c3,c4,sum(1 )cnt from ren group by c1,c2,c3,c4 )
select c1,c2,c3,c4,sum(cnt )cnt from t0 group by cube(c1,c2,c3,c4);
Run Time (s): real 1.477 user 13.421875 sys 0.265625

仅仅在原始group by cube代码前面加入一行，我们实现了加速5倍的效果，而且保留了cube结果标志列grouping_id()，便于区分不同的汇总层次。

同样的思路也可用于group by rollup, 但是由于这种分组的组合本来就少，提速效果就没那么明显。

D create temp table roc1c2c3c4 as select c1,c2,c3,c4,sum(1)cnt from ren group by rollup(c1,c2,c3,c4);
Run Time (s): real 2.073 user 24.796875 sys 0.375000

D create temp table roc1c2c3c4m3 as
with t0 as  materialized (select c1,c2,c3,c4,sum(1 )cnt from ren group by c1,c2,c3,c4 )
select c1,c2,c3,c4,sum(cnt )cnt from t0 group by rollup(c1,c2,c3,c4);
Run Time (s): real 1.315 user 13.406250 sys 0.156250

换用另一种分析型单机数据库glaredb做同样的测试，以验证这种方法的普适性。它没有windows版，所以在WSL中执行。

root@DESKTOP-59T6U68:/mnt/c/d# ./glaredb6
GlareDB Shell
v25.6.1
Enter .help for usage hints.
glaredb> .timer on
glaredb> create temp table ren as select
     ... ((random()*1000)::int%2+1)::int c1,
     ... ((random()*1000)::int%100+1)::int c2,
     ... ((random()*1000)::int%30+1)::int c3,
     ... ((random()*1000)::int%60+1)::int c4,
     ... ((random()*1000)::int%5+1)::int c5
     ... from generate_series(1,100000000::int);
┌───────────────┐
│ rows_inserted │
│ Int64         │
├───────────────┤
│     100000000 │
└───────────────┘
Execution duration (s): 6.74959

glaredb> create temp table rc1c2c3c4 as select c1,c2,c3,c4,sum(1)cnt from ren group by cube(c1,c2,c3,c4); --代码1
┌───────────────┐
│ rows_inserted │
│ Int64         │
├───────────────┤
│        572973 │
└───────────────┘
Execution duration (s): 15.61688

glaredb> create temp table rc1c2c3c4m as ...;   --代码2
┌───────────────┐
│ rows_inserted │
│ Int64         │
├───────────────┤
│        572973 │
└───────────────┘
Execution duration (s): 6.58844

glaredb> create temp table rc1c2c3c4m2 as ...;   --代码3

┌───────────────┐
│ rows_inserted │
│ Int64         │
├───────────────┤
│        572973 │
└───────────────┘
Execution duration (s): 1.52624

glaredb> create temp table rc1c2c3c4m3 as
     ... with t0 as  materialized (select c1,c2,c3,c4,sum(1 )cnt from ren group by c1,c2,c3,c4 )
     ... select c1,c2,c3,c4,sum(cnt )cnt from t0 group by cube(c1,c2,c3,c4);                       --代码4
┌───────────────┐
│ rows_inserted │
│ Int64         │
├───────────────┤
│        572973 │
└───────────────┘
Execution duration (s): 1.47236