最近,databricks收购mooncake,https://www.mooncake.dev/blog/mooncake-and-databricks,想看看pg_mooncake这个我几个月前就试用过的软件有什么变化,值得databricks花大钱收购。
按照存储库主页https://github.com/Mooncake-Labs/pg_mooncake 说明,初学者的最佳途径是使用docker,我上次就用的docker, 继续用它。
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker run --name mooncake --rm -e POSTGRES_PASSWORD=password mooncakelabs/pg_mooncake
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
waiting for server to start....2025-10-04 00:25:42.173 UTC [43] LOG: starting PostgreSQL 17.2 (Debian 17.2-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-10-04 00:25:42.173 UTC [43] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-04 00:25:42.180 UTC [46] LOG: database system was shut down at 2025-10-04 00:25:41 UTC
2025-10-04 00:25:42.185 UTC [43] LOG: database system is ready to accept connections
done
server started
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
waiting for server to shut down....2025-10-04 00:25:42.284 UTC [43] LOG: received fast shutdown request
2025-10-04 00:25:42.286 UTC [43] LOG: aborting any active transactions
2025-10-04 00:25:42.288 UTC [43] LOG: background worker "logical replication launcher" (PID 49) exited with exit code 1
2025-10-04 00:25:42.288 UTC [44] LOG: shutting down
2025-10-04 00:25:42.289 UTC [44] LOG: checkpoint starting: shutdown immediate
2025-10-04 00:25:42.297 UTC [44] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.002 s, total=0.010 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/14E4FA0, redo lsn=0/14E4FA0
2025-10-04 00:25:42.301 UTC [43] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2025-10-04 00:25:42.428 UTC [1] LOG: starting PostgreSQL 17.2 (Debian 17.2-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-10-04 00:25:42.430 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-10-04 00:25:42.430 UTC [1] LOG: listening on IPv6 address "::", port 5432
2025-10-04 00:25:42.432 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-04 00:25:42.436 UTC [57] LOG: database system was shut down at 2025-10-04 00:25:42 UTC
2025-10-04 00:25:42.444 UTC [1] LOG: database system is ready to accept connections
另开一个终端,登录到上述docker容器
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker exec -it mooncake psql -U postgres
psql (17.2 (Debian 17.2-1.pgdg120+1))
Type "help" for help.
postgres=# CREATE TABLE trades(
id bigint PRIMARY KEY,
symbol text,
time timestamp,
price real
);
CREATE TABLE
postgres=# CALL mooncake.create_table('trades_iceberg', 'trades');
ERROR: schema "mooncake" does not exist
LINE 1: CALL mooncake.create_table('trades_iceberg', 'trades');
^
postgres=# CREATE EXTENSION pg_mooncake CASCADE;
CREATE EXTENSION
postgres=# CALL mooncake.create_table('trades_iceberg', 'trades');
ERROR: procedure mooncake.create_table(unknown, unknown) does not exist
LINE 1: CALL mooncake.create_table('trades_iceberg', 'trades');
^
HINT: No procedure matches the given name and argument types. You might need to add explicit type casts.
postgres=# \q
页面中的测试用例跑不通,应该是我的版本太老,当时的语法是
postgres=# CREATE EXTENSION pg_mooncake;
CREATE EXTENSION
postgres=# CREATE TABLE user_activity(
user_id BIGINT,
activity_type TEXT,
activity_timestamp TIMESTAMP,
duration INT
) USING columnstore;
CREATE TABLE
postgres=# INSERT INTO user_activity VALUES
(1, 'login', '2024-01-01 08:00:00', 120),
(2, 'page_view', '2024-01-01 08:05:00', 30),
(3, 'logout', '2024-01-01 08:30:00', 60),
(4, 'error', '2024-01-01 08:13:00', 60);
INSERT 0 4
postgres=# SELECT * from user_activity;
user_id | activity_type | activity_timestamp | duration
---------+---------------+---------------------+----------
1 | login | 2024-01-01 08:00:00 | 120
2 | page_view | 2024-01-01 08:05:00 | 30
3 | logout | 2024-01-01 08:30:00 | 60
4 | error | 2024-01-01 08:13:00 | 60
(4 rows)
postgres=# SELECT * FROM mooncake.columnstore_tables;
table_name | path
---------------+---------------------------------------------------------------------------------------
user_activity | /var/lib/postgresql/data/mooncake_local_tables/mooncake_postgres_user_activity_16441/
(1 row)
postgres=# \q
新的例子采用了不同的参数。如下日志也说明了这一点,关闭这个容器,可见镜像是7-8月前的版本。
2025-10-04 00:28:04.392 UTC [64] ERROR: schema "mooncake" does not exist at character 6
2025-10-04 00:28:04.392 UTC [64] STATEMENT: CALL mooncake.create_table('trades_iceberg', 'trades');
2025-10-04 00:28:18.415 UTC [64] ERROR: procedure mooncake.create_table(unknown, unknown) does not exist at character 6
2025-10-04 00:28:18.415 UTC [64] HINT: No procedure matches the given name and argument types. You might need to add explicit type casts.
2025-10-04 00:28:18.415 UTC [64] STATEMENT: CALL mooncake.create_table('trades_iceberg', 'trades');
^C2025-10-04 00:30:33.389 UTC [1] LOG: received fast shutdown request
2025-10-04 00:30:33.393 UTC [1] LOG: aborting any active transactions
2025-10-04 00:30:33.394 UTC [1] LOG: background worker "logical replication launcher" (PID 60) exited with exit code 1
2025-10-04 00:30:33.395 UTC [55] LOG: shutting down
2025-10-04 00:30:33.397 UTC [55] LOG: checkpoint starting: shutdown immediate
2025-10-04 00:30:33.420 UTC [55] LOG: checkpoint complete: wrote 154 buffers (0.9%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.004 s, sync=0.017 s, total=0.025 s; sync files=95, longest=0.003 s, average=0.001 s; distance=710 kB, estimate=710 kB; lsn=0/15967A8, redo lsn=0/15967A8
2025-10-04 00:30:33.425 UTC [1] LOG: database system is shut down
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker image list -a
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.1ms.run/mooncakelabs/pg_mooncake latest a2c2ae695fb4 7 months ago 630 MB
<none> <none> 34093c9a105b 8 months ago 630 MB
<none> <none> 8d085fd5b274 8 months ago 630 MB
拉取新镜像
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker pull docker.1ms.run/mooncakelabs/pg_mooncake
Trying to pull docker.1ms.run/mooncakelabs/pg_mooncake:latest...
Getting image source signatures
Copying blob 6e6162b82c96 done
Copying blob b1badc6e5066 done
Copying blob 3166755df2cd done
Copying blob c1499316ffcf done
Copying blob 93c86b7657ed done
Copying blob 678fdefd6cc3 done
Copying blob 7960bd43e7e7 done
Copying blob 1eef446a2d11 done
Copying blob 717379dd4557 done
Copying blob 1752a03d130b done
Copying blob ee33dce80c14 done
Copying blob 5332a007b185 done
Copying blob 55772d575d05 done
Copying blob 5a24baf7c78c done
Copying blob b8a4c3d0b14f done
Copying blob 0b55fb0850d3 done
Copying blob 8d47236cbefc done
Copying blob 1599b6547893 done
Copying blob df1e3605e5e7 done
Copying blob 71b3885e1493 done
Copying config 35e6ebcc02 done
Writing manifest to image destination
Storing signatures
35e6ebcc02c0dda7c93cf9f88f3c7781d7d0bb7df07da45ab034a786fd43cf83
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker image list -a |grep pg_mooncake
docker.1ms.run/mooncakelabs/pg_mooncake latest 35e6ebcc02c0 2 weeks ago 836 MB
镜像大小增加了200MB,看来是增加了不少内容。再次运行新镜像容器。
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker run --name mooncake2 --rm -e POSTGRES_PASSWORD=password docker.1ms.run/moon
cakelabs/pg_mooncake
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
waiting for server to start....2025-10-04 00:37:28.452 UTC [30] LOG: starting PostgreSQL 17.6 (Debian 17.6-1.pgdg12+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
2025-10-04 00:37:28.452 UTC [30] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-04 00:37:28.457 UTC [33] LOG: database system was shut down at 2025-10-04 00:37:28 UTC
2025-10-04 00:37:28.463 UTC [30] LOG: database system is ready to accept connections
2025-10-04 00:37:28+00:00 INFO moonlink/src/moonlink_service/src/rpc_server.rs:56: Moonlink RPC server listening on Unix socket: "pg_mooncake/moonlink.sock"
2025-10-04 00:37:28+00:00 INFO moonlink/src/moonlink_service/src/lib.rs:178: Moonlink service started successfully
done
server started
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/0001-install-pg_duckdb.sql
CREATE EXTENSION
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/0002-enable-md-pg_duckdb.sql
waiting for server to shut down...2025-10-04 00:37:28.632 UTC [30] LOG: received fast shutdown request
.2025-10-04 00:37:28.634 UTC [30] LOG: aborting any active transactions
2025-10-04 00:37:28+00:00 INFO moonlink/src/moonlink_service/src/lib.rs:182: Received SIGTERM, shutting down...
2025-10-04 00:37:28+00:00 INFO moonlink/src/moonlink_service/src/lib.rs:207: Moonlink service shut down complete
2025-10-04 00:37:28.637 UTC [30] LOG: background worker "logical replication launcher" (PID 37) exited with exit code 1
2025-10-04 00:37:28.640 UTC [31] LOG: shutting down
2025-10-04 00:37:28.641 UTC [31] LOG: checkpoint starting: shutdown immediate
2025-10-04 00:37:28.670 UTC [31] LOG: checkpoint complete: wrote 234 buffers (1.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.006 s, sync=0.020 s, total=0.030 s; sync files=111, longest=0.002 s, average=0.001 s; distance=1336 kB, estimate=1336 kB; lsn=0/16FCCC8, redo lsn=0/16FCCC8
2025-10-04 00:37:28.674 UTC [30] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2025-10-04 00:37:28.779 UTC [1] LOG: starting PostgreSQL 17.6 (Debian 17.6-1.pgdg12+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
2025-10-04 00:37:28.780 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2025-10-04 00:37:28.780 UTC [1] LOG: listening on IPv6 address "::", port 5432
2025-10-04 00:37:28.782 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-10-04 00:37:28.786 UTC [67] LOG: database system was shut down at 2025-10-04 00:37:28 UTC
2025-10-04 00:37:28.793 UTC [1] LOG: database system is ready to accept connections
2025-10-04 00:37:28+00:00 INFO moonlink/src/moonlink_service/src/lib.rs:178: Moonlink service started successfully
2025-10-04 00:37:28+00:00 INFO moonlink/src/moonlink_service/src/rpc_server.rs:56: Moonlink RPC server listening on Unix socket: "pg_mooncake/moonlink.sock"
2025-10-04 00:38:48+00:00 INFO postgres_connection_monitor: /root/.cargo/git/checkouts/rust-postgres-72e3bb4aed4f8d49/14c8e59/tokio-postgres/src/connection.rs:358: NOTICE: publication "moonlink_pub" does not exist, skipping
2025-10-04 00:38:48.156 UTC [112] LOG: logical decoding found consistent point at 0/1798EB0
2025-10-04 00:38:48.156 UTC [112] DETAIL: There are no running transactions.
2025-10-04 00:38:48.156 UTC [112] STATEMENT: CREATE_REPLICATION_SLOT moonlink_slot_postgres LOGICAL pgoutput USE_SNAPSHOT
2025-10-04 00:38:48.198 UTC [113] LOG: starting logical decoding for slot "moonlink_slot_postgres"
2025-10-04 00:38:48.198 UTC [113] DETAIL: Streaming transactions committing after 0/1798EE8, reading WAL from 0/1798EB0.
2025-10-04 00:38:48.198 UTC [113] STATEMENT: START_REPLICATION SLOT moonlink_slot_postgres LOGICAL 0/0 ("proto_version" '2', "publication_names" 'moonlink_pub', "streaming" 'on')
2025-10-04 00:38:48.198 UTC [113] LOG: logical decoding found consistent point at 0/1798EB0
2025-10-04 00:38:48.198 UTC [113] DETAIL: There are no running transactions.
2025-10-04 00:38:48.198 UTC [113] STATEMENT: START_REPLICATION SLOT moonlink_slot_postgres LOGICAL 0/0 ("proto_version" '2', "publication_names" 'moonlink_pub', "streaming" 'on')
2025-10-04 00:38:48+00:00 WARN replication_event_loop: moonlink/src/moonlink_connectors/src/replication_state.rs:40: failed to send replication state for lsn 24743656 error=SendError { .. }
2025-10-04 00:38:48+00:00 WARN replication_event_loop: moonlink/src/moonlink_connectors/src/replication_state.rs:40: failed to send replication state for lsn 24744096 error=SendError { .. }
2025-10-04 00:42:28.864 UTC [65] LOG: checkpoint starting: time
2025-10-04 00:42:46.520 UTC [65] LOG: checkpoint complete: wrote 178 buffers (1.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=17.632 s, sync=0.013 s, total=17.657 s; sync files=77, longest=0.003 s, average=0.001 s; distance=651 kB, estimate=651 kB; lsn=0/179FD20, redo lsn=0/179FC90
从日志可以观察到,新版本包含了安装pg_duckdb插件的过程。并且又使用了Moonlink作为数据复制服务。客户端的例子也能跑通了。
root@DESKTOP-59T6U68:/mnt/c/Users/lt# docker exec -it mooncake2 psql -U postgres
psql (17.6 (Debian 17.6-1.pgdg12+1))
Type "help" for help.
postgres=# CREATE EXTENSION pg_mooncake CASCADE;
CREATE EXTENSION
postgres=# CREATE TABLE trades(
id bigint PRIMARY KEY,
symbol text,
time timestamp,
price real
);
CREATE TABLE
postgres=# CALL mooncake.create_table('trades_iceberg', 'trades');
CALL
postgres=# INSERT INTO trades VALUES
(1, 'AMD', '2024-06-05 10:00:00', 119),
(2, 'AMZN', '2024-06-05 10:05:00', 207),
(3, 'AAPL', '2024-06-05 10:10:00', 203),
(4, 'AMZN', '2024-06-05 10:15:00', 210);
INSERT 0 4
postgres=# SELECT avg(price) FROM trades_iceberg WHERE symbol = 'AMZN';
avg
-------
208.5
(1 row)
postgres=#
1024

被折叠的 条评论
为什么被折叠?



