前文说到,新版pg_mooncake服务端启动时,自动启动了pg_duckdb插件。而pg_duckdb插件存储库页面提到,只要输入一行set命令就可以启用pg_duckdb加速查询,我试用了确实如此。
启动服务端
docker run --name mooncake2 --rm -e POSTGRES_PASSWORD=password docker.1ms.run/mooncakelabs/pg_mooncake
...
server started
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/0001-install-pg_duckdb.sql
CREATE EXTENSION
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/0002-enable-md-pg_duckdb.sql
启动客户端, 启用pg_duckdb前,
docker exec -it mooncake2 psql -U postgres
psql (17.6 (Debian 17.6-1.pgdg12+1))
Type "help" for help.
postgres=# \timing on
Timing is on.
postgres=# create table t as select i from generate_series(1,10000000)t(i);
SELECT 10000000
Time: 3977.424 ms (00:03.977)
postgres=# select sum(i) from t group by round(log(i));
sum
----------------
6
4949999577625
494990550
4950617
49499924625
490
49590
45000005506497
(8 rows)
Time: 910.083 ms
postgres=# explain analyze select sum(i) from t group by round(log(i));
Time: 1108.077 ms (00:01.108)
postgres=# explain analyze select sum(i) from t group by round(log(i));
Time: 1131.565 ms (00:01.132)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=860082.81..1132795.83 rows=10002082 width=16) (actual time=1129.568..1129.626 rows=8 loops=1)
Group Key: round(log((i)::double precision))
Planned Partitions: 128 Batches: 1 Memory Usage: 1561kB
-> Seq Scan on t (cost=0.00..219324.43 rows=10002082 width=12) (actual time=16.601..585.642 rows=10000000 loops=1)
Planning Time: 0.192 ms
JIT:
Functions: 6
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.996 ms (Deform 0.256 ms), Inlining 2.268 ms, Optimization 7.852 ms, Emission 6.255 ms, Total 17.371 ms
Execution Time: 1130.900 ms
(10 rows)
启用pg_duckdb后,执行时间减少了一半,执行计划也变成了duckdb的计划
postgres=# SET duckdb.force_execution = true;
SET
Time: 2.390 ms
postgres=# select sum(i) from t group by round(log(i));
sum
----------------
6
49590
4950617
494990550
49499924625
45000005506497
490
4949999577625
(8 rows)
Time: 482.909 ms
postgres=# explain analyze select sum(i) from t group by round(log(i));
Time: 397.873 ms
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Custom Scan (DuckDBScan) (cost=0.00..0.00 rows=0 width=0) (actual time=0.001..0.001 rows=0 loops=1)
DuckDB Execution Plan:
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT sum(i) AS sum FROM pgduckdb.public.t GROUP BY (round(log((i)::double precision)))
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.392s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ sum │
│ │
│ 8 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY │
│ ──────────────────── │
│ Groups: #0 │
│ Aggregates: sum(#1) │
│ │
│ 8 Rows │
│ (0.12s) │
duckdb特有的读入parquet文件功能也能用, 但是要采用特殊的语法,参考https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md#read-functions
postgres=# copy t to 't.parquet';
ERROR: (PGDuckDB/DuckdbUtilityHook_Cpp) Executor Error: (PGDuckDB/MakeDuckdbCopyQuery) relative path not allowed for COPY to file
Time: 6.413 ms
postgres=# copy t to '/tmp/t.parquet';
COPY 10000000
Time: 768.763 ms
postgres=# select * from read_parquet('/tmp/t.parquet') limit 4;
i
------
7233
7234
7235
7236
(4 rows)
postgres=# select sum(r['i']) from read_parquet('/tmp/t.parquet') r;
sum
----------------
50000005000000
(1 row)
Time: 45.564 ms
postgres=# select sum(r['i']) from read_parquet('/tmp/t.parquet') r group by round(log(r['i']::float));
sum
----------------
490
6
45000005506497
49499924625
494990550
49590
4950617
4949999577625
(8 rows)
Time: 47.362 ms
因为绕过了postgresql数据库,查询速度还要快一些。基本与duckdb看齐
duckdb140 test.db
D create table t1 as select i from generate_series(1,10000000)t(i);
Run Time (s): real 0.170 user 0.296875 sys 0.031250
D select sum(i) from t1 group by round(log(i));
┌────────────────┐
│ sum(i) │
│ int128 │
├────────────────┤
│ 6 │
│ 49590 │
│ 4950617 │
│ 490 │
│ 494990550 │
│ 45000005506497 │
│ 49499924625 │
│ 4949999577625 │
└────────────────┘
Run Time (s): real 0.053 user 0.187500 sys 0.234375

1074

被折叠的 条评论
为什么被折叠?



