在pg_mooncake容器中启用pg_duckdb加速查询

前文说到,新版pg_mooncake服务端启动时,自动启动了pg_duckdb插件。而pg_duckdb插件存储库页面提到,只要输入一行set命令就可以启用pg_duckdb加速查询,我试用了确实如此。

启动服务端

docker run --name mooncake2 --rm -e POSTGRES_PASSWORD=password docker.1ms.run/mooncakelabs/pg_mooncake
...
server started

/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/0001-install-pg_duckdb.sql
CREATE EXTENSION


/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/0002-enable-md-pg_duckdb.sql


启动客户端, 启用pg_duckdb前,

docker exec -it mooncake2 psql -U postgres
psql (17.6 (Debian 17.6-1.pgdg12+1))
Type "help" for help.

postgres=# \timing on
Timing is on.
postgres=# create table t as select i from generate_series(1,10000000)t(i);
SELECT 10000000
Time: 3977.424 ms (00:03.977)
postgres=# select sum(i) from t group by round(log(i));
      sum
----------------
              6
  4949999577625
      494990550
        4950617
    49499924625
            490
          49590
 45000005506497
(8 rows)

Time: 910.083 ms
postgres=# explain analyze select sum(i) from t group by round(log(i));
Time: 1108.077 ms (00:01.108)
postgres=# explain analyze select sum(i) from t group by round(log(i));
Time: 1131.565 ms (00:01.132)
                                                          QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=860082.81..1132795.83 rows=10002082 width=16) (actual time=1129.568..1129.626 rows=8 loops=1)
   Group Key: round(log((i)::double precision))
   Planned Partitions: 128  Batches: 1  Memory Usage: 1561kB
   ->  Seq Scan on t  (cost=0.00..219324.43 rows=10002082 width=12) (actual time=16.601..585.642 rows=10000000 loops=1)
 Planning Time: 0.192 ms
 JIT:
   Functions: 6
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.996 ms (Deform 0.256 ms), Inlining 2.268 ms, Optimization 7.852 ms, Emission 6.255 ms, Total 17.371 ms
 Execution Time: 1130.900 ms
(10 rows)

启用pg_duckdb后,执行时间减少了一半,执行计划也变成了duckdb的计划

postgres=# SET duckdb.force_execution = true;
SET
Time: 2.390 ms
postgres=# select sum(i) from t group by round(log(i));
      sum
----------------
              6
          49590
        4950617
      494990550
    49499924625
 45000005506497
            490
  4949999577625
(8 rows)

Time: 482.909 ms
postgres=# explain analyze select sum(i) from t group by round(log(i));
Time: 397.873 ms


                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Custom Scan (DuckDBScan)  (cost=0.00..0.00 rows=0 width=0) (actual time=0.001..0.001 rows=0 loops=1)
   DuckDB Execution Plan:

 ┌─────────────────────────────────────┐
 │┌───────────────────────────────────┐│
 ││    Query Profiling Information    ││
 │└───────────────────────────────────┘│
 └─────────────────────────────────────┘
 EXPLAIN ANALYZE  SELECT sum(i) AS sum FROM pgduckdb.public.t GROUP BY (round(log((i)::double precision)))
 ┌────────────────────────────────────────────────┐
 │┌──────────────────────────────────────────────┐│
 ││              Total Time: 0.392s              ││
 │└──────────────────────────────────────────────┘│
 └────────────────────────────────────────────────┘
 ┌───────────────────────────┐
 │           QUERY           │
 └─────────────┬─────────────┘
 ┌─────────────┴─────────────┐
 │      EXPLAIN_ANALYZE      │
 │    ────────────────────   │
 │           0 Rows          │
 │          (0.00s)          │
 └─────────────┬─────────────┘
 ┌─────────────┴─────────────┐
 │         PROJECTION        │
 │    ────────────────────   │
 │            sum            │
 │                           │
 │           8 Rows          │
 │          (0.00s)          │
 └─────────────┬─────────────┘
 ┌─────────────┴─────────────┐
 │       HASH_GROUP_BY       │
 │    ────────────────────   │
 │         Groups: #0        │
 │    Aggregates: sum(#1)    │
 │                           │
 │           8 Rows          │
 │          (0.12s)

duckdb特有的读入parquet文件功能也能用, 但是要采用特殊的语法,参考https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md#read-functions

postgres=# copy t to 't.parquet';
ERROR:  (PGDuckDB/DuckdbUtilityHook_Cpp) Executor Error: (PGDuckDB/MakeDuckdbCopyQuery) relative path not allowed for COPY to file
Time: 6.413 ms
postgres=# copy t to '/tmp/t.parquet';
COPY 10000000
Time: 768.763 ms
postgres=# select * from read_parquet('/tmp/t.parquet') limit 4;
  i
------
 7233
 7234
 7235
 7236
(4 rows)

postgres=# select sum(r['i']) from read_parquet('/tmp/t.parquet') r;
      sum
----------------
 50000005000000
(1 row)

Time: 45.564 ms
postgres=# select sum(r['i']) from read_parquet('/tmp/t.parquet') r group by round(log(r['i']::float));
      sum
----------------
            490
              6
 45000005506497
    49499924625
      494990550
          49590
        4950617
  4949999577625
(8 rows)

Time: 47.362 ms

因为绕过了postgresql数据库,查询速度还要快一些。基本与duckdb看齐

duckdb140 test.db
D create table t1 as select i from generate_series(1,10000000)t(i);
Run Time (s): real 0.170 user 0.296875 sys 0.031250
D select sum(i) from t1 group by round(log(i));
┌────────────────┐
│     sum(i)     │
│     int128     │
├────────────────┤
│              6 │
│          49590 │
│        4950617 │
│            490 │
│      494990550 │
│ 45000005506497 │
│    49499924625 │
│  4949999577625 │
└────────────────┘
Run Time (s): real 0.053 user 0.187500 sys 0.234375
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值