一个神奇的SQL:解决Advent of Code 2024第19题字符串组合方法欣赏

这也是DuckDB CTE功能的开发者kryonix给出的参考链接
长话短说,这个SQL在DuckDB 1.4.2 版本上:

  1. 单线程快于多线程,相差达5倍。
  2. 处理行数和用时关系不成比例,400行,40行,4行分别5秒、4秒、3.6秒。
  3. 多线程用时不稳定,有时4行比40、400行还慢。
--默认多线程
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │      part2      │
│ int64 │     int128      │
├───────┼─────────────────┤
│   330950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 34.610 user 43.304000 sys 16.960000
memory D .read day19_40.sql
┌───────┬────────────────┐
│ part1 │     part2      │
│ int64 │     int128     │
├───────┼────────────────┤
│    3555962276290066 │
└───────┴────────────────┘
Run Time (s): real 18.690 user 27.800000 sys 12.432000
memory D .read day19_4.sql
┌───────┬───────────────┐
│ part1 │     part2     │
│ int64 │    int128     │
├───────┼───────────────┤
│     21849911000117 │
└───────┴───────────────┘
Run Time (s): real 36.742 user 45.564000 sys 14.936000

--改为单线程
memory D set threads=1;

memory D .read day19_4.sql
┌───────┬───────────────┐
│ part1 │     part2     │
│ int64 │    int128     │
├───────┼───────────────┤
│     21849911000117 │
└───────┴───────────────┘
Run Time (s): real 3.652 user 2.892000 sys 0.756000

memory D .read day19_40.sql
┌───────┬────────────────┐
│ part1 │     part2      │
│ int64 │     int128     │
├───────┼────────────────┤
│    3555962276290066 │
└───────┴────────────────┘
Run Time (s): real 4.037 user 3.228000 sys 0.808000

memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │      part2      │
│ int64 │     int128      │
├───────┼─────────────────┤
│   330950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 5.193 user 4.292000 sys 0.900000

要复现这个问题,首先需要把原始链接示例字符串换成如下文件中的内容,然后第一行不变,不带逗号分隔的行随意删减。

另外,如果把临时表available和requested用create table命令建立物理表,然后只保留如下查询代码,在第3行用where rowid<n控制处理行数,性能有极大的提高,而且,用时和行数成比例了。不set threads=1也没有不好的影响了。

with recursive
match(pattern, step, pos, c) as (
select pattern, 1, 1, 1::bigint from requested --where rowid<40
union all
(with tmp as (select * from match),
 agg as (select pattern, step, pos, sum(c) as c from tmp where pos>=step group by pattern, step, pos)
 select pattern, step+1, pos, c::bigint from agg where pos>step
 union all
 select pattern, step+1, pos+length(ap), c::bigint from agg, available
 where pos=step and pos+length(ap)<=length(pattern)+1 and substr(pattern, pos, length(ap))=ap)),
matched(pattern, c) as (
 select pattern,sum(c) from match where pos=length(pattern)+1 and step=pos group by pattern)
select (select count(*) from matched where c>0) as part1,
       (select sum(c) from matched) as part2

物理表测试结果

memory D set threads=1;
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │      part2      │
│ int64 │     int128      │
├───────┼─────────────────┤
│   330950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 0.859 user 0.824000 sys 0.000000
memory D .read day19_400.sql
┌───────┬────────────────┐
│ part1 │     part2      │
│ int64 │     int128     │
├───────┼────────────────┤
│    3555962276290066 │
└───────┴────────────────┘
Run Time (s): real 0.137 user 0.132000 sys 0.004000
memory D .read day19_400.sql
┌───────┬───────────────┐
│ part1 │     part2     │
│ int64 │    int128     │
├───────┼───────────────┤
│     21849911000117 │
└───────┴───────────────┘
Run Time (s): real 0.063 user 0.048000 sys 0.012000
memory D 

--改回多线程
memory D set threads=4;
memory D .read day19_400.sql
┌───────┬─────────────────┐
│ part1 │      part2      │
│ int64 │     int128      │
├───────┼─────────────────┤
│   330950763269786650 │
└───────┴─────────────────┘
Run Time (s): real 0.833 user 1.736000 sys 0.200000
memory D .read day19_400.sql
┌───────┬────────────────┐
│ part1 │     part2      │
│ int64 │     int128     │
├───────┼────────────────┤
│    3555962276290066 │
└───────┴────────────────┘
Run Time (s): real 0.146 user 0.260000 sys 0.072000
memory D .read day19_400.sql
┌───────┬───────────────┐
│ part1 │     part2     │
│ int64 │    int128     │
├───────┼───────────────┤
│     21849911000117 │
└───────┴───────────────┘
Run Time (s): real 0.063 user 0.108000 sys 0.060000

原始代码抄录如下,以防无法访问。

with recursive aoc19_input(i) as (select '
r, wr, b, g, bwu, rb, gb, br

brwrr
bggr
gbbr
rrbgbr
ubwu
bwurrg
brgr
bbrgwb
'),
lines(y,line) as (
   select 0, substr(i,1,position(E'\n' in i)-1), substr(i,position(E'\n' in i)+1)
   from aoc19_input
   union all
   select y+1,substr(r,1,position(E'\n' in r)-1), substr(r,position(E'\n' in r)+1)
   from lines l(y,l,r) where position(E'\n' in r)>0
),
available(ap) as (
   select substr(line,1,position(', ' in line)-1) as ap, substr(line,position(', ' in line)+2) as r from lines where line like '%, %'
   union all
   select case when r like '%, %' then substr(r,1,position(', ' in r)-1) else r end, case when r like '%, %' then substr(r,position(', ' in r)+2) else '' end
   from available where r<>''
),
requested(pattern) as (select line from lines where line<>'' and line not like '%, %'),
match(pattern, step, pos, c) as (
select pattern, 1, 1, 1::bigint from requested
union all
(with tmp as (select * from match),
 agg as (select pattern, step, pos, sum(c) as c from tmp where pos>=step group by pattern, step, pos)
 select pattern, step+1, pos, c::bigint from agg where pos>step
 union all
 select pattern, step+1, pos+length(ap), c::bigint from agg, available
 where pos=step and pos+length(ap)<=length(pattern)+1 and substr(pattern, pos, length(ap))=ap)),
matched(pattern, c) as (
 select pattern,sum(c) from match where pos=length(pattern)+1 and step=pos group by pattern)
select (select count(*) from matched where c>0) as part1,
       (select sum(c) from matched) as part2

另外,上文中的tmp as (select * from match),似乎是多余的,去掉它,再把下一行from tmp改为from match,结果不变。

补记:
在available和requested 的定义处使用MATERIALIZED可以达到建物理表相同的效果,不知道默认MATERIALIZED的1.4.2版本为什么在这里没有生效。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值