这也是DuckDB CTE功能的开发者kryonix给出的参考链接
长话短说,这个SQL在DuckDB 1.4.2 版本上:
- 单线程快于多线程,相差达5倍。
- 处理行数和用时关系不成比例,400行,40行,4行分别5秒、4秒、3.6秒。
- 多线程用时不稳定,有时4行比40、400行还慢。
--默认多线程
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼─────────────────┤
│ 330 │ 950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 34.610 user 43.304000 sys 16.960000
memory D .read day19_40.sql
┌───────┬────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼────────────────┤
│ 35 │ 55962276290066 │
└───────┴────────────────┘
Run Time (s): real 18.690 user 27.800000 sys 12.432000
memory D .read day19_4.sql
┌───────┬───────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼───────────────┤
│ 2 │ 1849911000117 │
└───────┴───────────────┘
Run Time (s): real 36.742 user 45.564000 sys 14.936000
--改为单线程
memory D set threads=1;
memory D .read day19_4.sql
┌───────┬───────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼───────────────┤
│ 2 │ 1849911000117 │
└───────┴───────────────┘
Run Time (s): real 3.652 user 2.892000 sys 0.756000
memory D .read day19_40.sql
┌───────┬────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼────────────────┤
│ 35 │ 55962276290066 │
└───────┴────────────────┘
Run Time (s): real 4.037 user 3.228000 sys 0.808000
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼─────────────────┤
│ 330 │ 950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 5.193 user 4.292000 sys 0.900000
要复现这个问题,首先需要把原始链接示例字符串换成如下文件中的内容,然后第一行不变,不带逗号分隔的行随意删减。
另外,如果把临时表available和requested用create table命令建立物理表,然后只保留如下查询代码,在第3行用where rowid<n控制处理行数,性能有极大的提高,而且,用时和行数成比例了。不set threads=1也没有不好的影响了。
with recursive
match(pattern, step, pos, c) as (
select pattern, 1, 1, 1::bigint from requested --where rowid<40
union all
(with tmp as (select * from match),
agg as (select pattern, step, pos, sum(c) as c from tmp where pos>=step group by pattern, step, pos)
select pattern, step+1, pos, c::bigint from agg where pos>step
union all
select pattern, step+1, pos+length(ap), c::bigint from agg, available
where pos=step and pos+length(ap)<=length(pattern)+1 and substr(pattern, pos, length(ap))=ap)),
matched(pattern, c) as (
select pattern,sum(c) from match where pos=length(pattern)+1 and step=pos group by pattern)
select (select count(*) from matched where c>0) as part1,
(select sum(c) from matched) as part2
物理表测试结果
memory D set threads=1;
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼─────────────────┤
│ 330 │ 950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 0.859 user 0.824000 sys 0.000000
memory D .read day19_400.sql
┌───────┬────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼────────────────┤
│ 35 │ 55962276290066 │
└───────┴────────────────┘
Run Time (s): real 0.137 user 0.132000 sys 0.004000
memory D .read day19_400.sql
┌───────┬───────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼───────────────┤
│ 2 │ 1849911000117 │
└───────┴───────────────┘
Run Time (s): real 0.063 user 0.048000 sys 0.012000
memory D
--改回多线程
memory D set threads=4;
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼─────────────────┤
│ 330 │ 950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 0.833 user 1.736000 sys 0.200000
memory D .read day19_400.sql
┌───────┬────────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼────────────────┤
│ 35 │ 55962276290066 │
└───────┴────────────────┘
Run Time (s): real 0.146 user 0.260000 sys 0.072000
memory D .read day19_400.sql
┌───────┬───────────────┐
│ part1 │ part2 │
│ int64 │ int128 │
├───────┼───────────────┤
│ 2 │ 1849911000117 │
└───────┴───────────────┘
Run Time (s): real 0.063 user 0.108000 sys 0.060000
原始代码抄录如下,以防无法访问。
with recursive aoc19_input(i) as (select '
r, wr, b, g, bwu, rb, gb, br
brwrr
bggr
gbbr
rrbgbr
ubwu
bwurrg
brgr
bbrgwb
'),
lines(y,line) as (
select 0, substr(i,1,position(E'\n' in i)-1), substr(i,position(E'\n' in i)+1)
from aoc19_input
union all
select y+1,substr(r,1,position(E'\n' in r)-1), substr(r,position(E'\n' in r)+1)
from lines l(y,l,r) where position(E'\n' in r)>0
),
available(ap) as (
select substr(line,1,position(', ' in line)-1) as ap, substr(line,position(', ' in line)+2) as r from lines where line like '%, %'
union all
select case when r like '%, %' then substr(r,1,position(', ' in r)-1) else r end, case when r like '%, %' then substr(r,position(', ' in r)+2) else '' end
from available where r<>''
),
requested(pattern) as (select line from lines where line<>'' and line not like '%, %'),
match(pattern, step, pos, c) as (
select pattern, 1, 1, 1::bigint from requested
union all
(with tmp as (select * from match),
agg as (select pattern, step, pos, sum(c) as c from tmp where pos>=step group by pattern, step, pos)
select pattern, step+1, pos, c::bigint from agg where pos>step
union all
select pattern, step+1, pos+length(ap), c::bigint from agg, available
where pos=step and pos+length(ap)<=length(pattern)+1 and substr(pattern, pos, length(ap))=ap)),
matched(pattern, c) as (
select pattern,sum(c) from match where pos=length(pattern)+1 and step=pos group by pattern)
select (select count(*) from matched where c>0) as part1,
(select sum(c) from matched) as part2
另外,上文中的tmp as (select * from match),似乎是多余的,去掉它,再把下一行from tmp改为from match,结果不变。
补记:
在available和requested 的定义处使用MATERIALIZED可以达到建物理表相同的效果,不知道默认MATERIALIZED的1.4.2版本为什么在这里没有生效。

4505

被折叠的 条评论
为什么被折叠?



