我们选择DuckDB、GlareDB、clickhouse、datafusion-cli来测试。
本来还想测试国产的Databend
https://repo.databend.com/databend/v1.2.755-nightly/databend-v1.2.755-nightly-aarch64-unknown-linux-gnu.tar.gz
https://repo.databend.com/bendsql/v0.27.5/bendsql-aarch64-unknown-linux-musl.tar.gz
,它需要建stage,步骤比较复杂。
和闭源的cedardb
https://download.cedardb.com/latest/cedar-current-arm64.tar.xz
但还不支持对parquet文件直接查询,暂时没测。
1.利用tpchgen-rs工具生成parquet格式TPCH数据
首先需要一个rust语言编译环境,可参考这篇文章,
再下载源代码包,然后在解压缩后的目录下,运行cargo build --release编译。
选择一个目录存放文件,我选的是/par/tpch/,然后运行如下命令行生成sf4规模的数据集。
time /par/tpchgen-rs-main/target/release/tpchgen-cli -s4 --output-dir sf4-parquet --format=parquet
real 0m15.582s
user 1m2.044s
sys 0m2.060s
ls -l /par/tpch/sf4-parquet
total 1456872
-rw-r--r-- 1 root root 53985688 Jun 27 05:54 customer.parquet
-rw-r--r-- 1 root root 975758540 Jun 27 05:54 lineitem.parquet
-rw-r--r-- 1 root root 2966 Jun 27 05:54 nation.parquet
-rw-r--r-- 1 root root 262169281 Jun 27 05:54 orders.parquet
-rw-r--r-- 1 root root 27412078 Jun 27 05:54 part.parquet
-rw-r--r-- 1 root root 168886901 Jun 27 05:54 partsupp.parquet
-rw-r--r-- 1 root root 1474 Jun 27 05:54 region.parquet
-rw-r--r-- 1 root root 3581714 Jun 27 05:54 supplier.parquet
2.进入数据库各自的CLI界面,都打开计时功能,分别复制粘贴如下两个查询语句。
SELECT
--Query01
l_returnflag,
l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM
'/par/tpch/sf4-parquet/lineitem.parquet'
WHERE
l_shipdate <= CAST('1998-09-02' AS date)
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
select
--Query03
l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from
'/par/tpch/sf4-parquet/customer.parquet' c,
'/par/tpch/sf4-parquet/orders.parquet' o,
'/par/tpch/sf4-parquet/lineitem.parquet' l
where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date'1995-03-15' and l_shipdate > date'1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
3.计算结果比较
查询1
--duckdb
┌──────────────┬──────────────┬───────────────┬─────────────────┬───────────────────┬─────────────────────┬────────────────────┬────────────────────┬──────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │
│ varchar │ varchar │ decimal(38,2) │ decimal(38,2) │ decimal(38,4) │ decimal(38,6) │ double │ double │ double │ int64 │
├──────────────┼──────────────┼───────────────┼─────────────────┼───────────────────┼─────────────────────┼────────────────────┼────────────────────┼──────────────────────┼─────────────┤
│ A │ F │ 151026307.00 │ 226509091035.52 │ 215184111589.9063 │ 223794237456.679790 │ 25.503385786855176 │ 38249.95027461518 │ 0.0499973943794925 │ 5921814 │
│ N │ F │ 3931226.00 │ 5887988103.96 │ 5593393984.5160 │ 5817216592.906474 │ 25.5383865812621 │ 38250.0818789871 │ 0.05005768706068835 │ 153934 │
│ N │ O │ 297323932.00 │ 445900037882.87 │ 423604954474.6505 │ 440557228781.029205 │ 25.500207853277455 │ 38242.947923201406 │ 0.049996762343212715 │ 11659667 │
│ R │ F │ 151132549.00 │ 226643425460.80 │ 215313759563.1301 │ 223924615047.433519 │ 25.512903862388963 │ 38260.00397058022 │ 0.04999442922234138 │ 5923769 │
└──────────────┴──────────────┴───────────────┴─────────────────┴───────────────────┴─────────────────────┴────────────────────┴────────────────────┴──────────────────────┴─────────────┘
Run Time (s): real 1.596 user 3.024000 sys 0.228000
--glaredb
┌──────────────┬──────────────┬──────────────────┬──────────────────┬───────────────────┬─────────────────────┬────────────────────┬────────────────────┬──────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │
│ Utf8 │ Utf8 │ Decimal128(38,2) │ Decimal128(38,2) │ Decimal128(38,4) │ Decimal128(38,6) │ Float64 │ Float64 │ Float64 │ Int64 │
├──────────────┼──────────────┼──────────────────┼──────────────────┼───────────────────┼─────────────────────┼────────────────────┼────────────────────┼──────────────────────┼─────────────┤
│ A │ F │ 151026307.00 │ 226509091035.52 │ 215184111589.9063 │ 223794237456.679790 │ 25.503385786855176 │ 38249.95027461518 │ 0.0499973943794925 │ 5921814 │
│ N │ F │ 3931226.00 │ 5887988103.96 │ 5593393984.5160 │ 5817216592.906474 │ 25.5383865812621 │ 38250.0818789871 │ 0.05005768706068835 │ 153934 │
│ N │ O │ 297323932.00 │ 445900037882.87 │ 423604954474.6505 │ 440557228781.029205 │ 25.500207853277455 │ 38242.947923201406 │ 0.049996762343212715 │ 11659667 │
│ R │ F │ 151132549.00 │ 226643425460.80 │ 215313759563.1301 │ 223924615047.433519 │ 25.512903862388963 │ 38260.00397058022 │ 0.04999442922234138 │ 5923769 │
└──────────────┴──────────────┴──────────────────┴──────────────────┴───────────────────┴─────────────────────┴────────────────────┴────────────────────┴──────────────────────┴─────────────┘
Execution duration (s): 2.75739
--clickhouse
┌─l_returnflag─┬─l_linestatus─┬───sum_qty─┬──sum_base_price─┬────sum_disc_price─┬──────────sum_charge─┬────────────avg_qty─┬──────────avg_price─┬─────────────avg_disc─┬─count_order─┐
1. │ A │ F │ 151026307 │ 226509091035.52 │ 215184111589.9063 │ 223794237456.67979 │ 25.503385786855176 │ 38249.95027461518 │ 0.0499973943794925 │ 5921814 │
2. │ N │ F │ 3931226 │ 5887988103.96 │ 5593393984.516 │ 5817216592.906474 │ 25.5383865812621 │ 38250.0818789871 │ 0.05005768706068835 │ 153934 │
3. │ N │ O │ 297323932 │ 445900037882.87 │ 423604954474.6505 │ 440557228781.029205 │ 25.500207853277455 │ 38242.947923201406 │ 0.049996762343212715 │ 11659667 │
4. │ R │ F │ 151132549 │ 226643425460.8 │ 215313759563.1301 │ 223924615047.433519 │ 25.512903862388963 │ 38260.00397058021 │ 0.04999442922234139 │ 5923769 │
└──────────────┴──────────────┴───────────┴─────────────────┴───────────────────┴─────────────────────┴────────────────────┴────────────────────┴──────────────────────┴─────────────┘
4 rows in set. Elapsed: 2.432 sec. Processed 24.00 million rows, 974.66 MB (9.87 million rows/s., 400.81 MB/s.)
Peak memory usage: 147.13 MiB.
--datafusion-cli
+--------------+--------------+--------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order |
+--------------+--------------+--------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| A | F | 151026307.00 | 226509091035.52 | 215184111589.9063 | 223794237456.679790 | 25.503385 | 38249.950274 | 0.049997 | 5921814 |
| N | F | 3931226.00 | 5887988103.96 | 5593393984.5160 | 5817216592.906474 | 25.538386 | 38250.081878 | 0.050057 | 153934 |
| N | O | 297323932.00 | 445900037882.87 | 423604954474.6505 | 440557228781.029205 | 25.500207 | 38242.947923 | 0.049996 | 11659667 |
| R | F | 151132549.00 | 226643425460.80 | 215313759563.1301 | 223924615047.433519 | 25.512903 | 38260.003970 | 0.049994 | 5923769 |
+--------------+--------------+--------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
4 row(s) fetched.
Elapsed 2.364 seconds.
查询3
--duckdb
┌────────────┬───────────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
│ int64 │ decimal(38,4) │ date │ int32 │
├────────────┼───────────────┼─────────────┼────────────────┤
│ 21679558 │ 442394.3966 │ 1995-01-17 │ 0 │
│ 8443557 │ 429378.6581 │ 1995-03-12 │ 0 │
│ 10631491 │ 423595.9502 │ 1995-03-14 │ 0 │
│ 20841824 │ 421258.0424 │ 1995-03-08 │ 0 │
│ 11568359 │ 418888.5591 │ 1995-03-13 │ 0 │
│ 10725895 │ 399668.4227 │ 1995-03-12 │ 0 │
│ 20811973 │ 399662.4710 │ 1995-02-13 │ 0 │
│ 17937728 │ 396726.3289 │ 1995-03-12 │ 0 │
│ 6128295 │ 393717.7231 │ 1995-03-05 │ 0 │
│ 21236614 │ 392019.3225 │ 1995-03-14 │ 0 │
├────────────┴───────────────┴─────────────┴────────────────┤
│ 10 rows 4 columns │
└───────────────────────────────────────────────────────────┘
Run Time (s): real 0.425 user 1.832000 sys 0.296000
--glaredb
┌────────────┬──────────────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
│ Int64 │ Decimal128(38,4) │ Date32 │ Int32 │
├────────────┼──────────────────┼─────────────┼────────────────┤
│ 21679558 │ 442394.3966 │ 1995-01-17 │ 0 │
│ 8443557 │ 429378.6581 │ 1995-03-12 │ 0 │
│ 10631491 │ 423595.9502 │ 1995-03-14 │ 0 │
│ 20841824 │ 421258.0424 │ 1995-03-08 │ 0 │
│ 11568359 │ 418888.5591 │ 1995-03-13 │ 0 │
│ 10725895 │ 399668.4227 │ 1995-03-12 │ 0 │
│ 20811973 │ 399662.4710 │ 1995-02-13 │ 0 │
│ 17937728 │ 396726.3289 │ 1995-03-12 │ 0 │
│ 6128295 │ 393717.7231 │ 1995-03-05 │ 0 │
│ 21236614 │ 392019.3225 │ 1995-03-14 │ 0 │
└────────────┴──────────────────┴─────────────┴────────────────┘
Execution duration (s): 1.08639
--clickhouse
┌─l_orderkey─┬─────revenue─┬─o_orderdate─┬─o_shippriority─┐
1. │ 21679558 │ 442394.3966 │ 1995-01-17 │ 0 │
2. │ 8443557 │ 429378.6581 │ 1995-03-12 │ 0 │
3. │ 10631491 │ 423595.9502 │ 1995-03-14 │ 0 │
4. │ 20841824 │ 421258.0424 │ 1995-03-08 │ 0 │
5. │ 11568359 │ 418888.5591 │ 1995-03-13 │ 0 │
6. │ 10725895 │ 399668.4227 │ 1995-03-12 │ 0 │
7. │ 20811973 │ 399662.471 │ 1995-02-13 │ 0 │
8. │ 17937728 │ 396726.3289 │ 1995-03-12 │ 0 │
9. │ 6128295 │ 393717.7231 │ 1995-03-05 │ 0 │
10. │ 21236614 │ 392019.3225 │ 1995-03-14 │ 0 │
└────────────┴─────────────┴─────────────┴────────────────┘
10 rows in set. Elapsed: 3.736 sec. Processed 30.60 million rows, 1.29 GB (8.19 million rows/s., 345.50 MB/s.)
Peak memory usage: 1.33 GiB.
--datafusion-cli
+------------+-------------+-------------+----------------+
| l_orderkey | revenue | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
| 21679558 | 442394.3966 | 1995-01-17 | 0 |
| 8443557 | 429378.6581 | 1995-03-12 | 0 |
| 10631491 | 423595.9502 | 1995-03-14 | 0 |
| 20841824 | 421258.0424 | 1995-03-08 | 0 |
| 11568359 | 418888.5591 | 1995-03-13 | 0 |
| 10725895 | 399668.4227 | 1995-03-12 | 0 |
| 20811973 | 399662.4710 | 1995-02-13 | 0 |
| 17937728 | 396726.3289 | 1995-03-12 | 0 |
| 6128295 | 393717.7231 | 1995-03-05 | 0 |
| 21236614 | 392019.3225 | 1995-03-14 | 0 |
+------------+-------------+-------------+----------------+
10 row(s) fetched.
Elapsed 1.233 seconds.
4.总结
从SQL兼容性看,DuckDB最好,多个表连接时不需要别名也能解析成功。为了其他数据库也能执行,我统一加上了表别名。
从表格显示看,DuckDB、GlareDB、clickhouse都能做到文本类型左对齐,数字类型右对齐,前两个还能显示数据类型。datafusion-cli统一左对齐。clickhouse自动删除末尾的0。
从执行速度看,duckdb最快,其他3种数据库都在同一水平。
1211

被折叠的 条评论
为什么被折叠?



