在sf=0.1时测试fireducks、duckdb、polars的tpch

首先,从https://github.1git.de/fireducks-dev/polars-tpch下载源代码包,将其解压缩到/par/fire目录。
然后进入此目录,运行
SCALE_FACTOR=0.1 ./run-fireducks.sh,脚本会首先安装所需的包,编译tpch的数据生成器,然后按照sf=0.1生成tbl文件,再转化为parquet格式,最后执行。
如下所示:

root@DESKTOP-59T6U68:/par/fire# SCALE_FACTOR=0.1 ./run-fireducks.sh
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Requirement already satisfied: pyarrow in ./.venv/lib/python3.13/site-packages (20.0.0)
Requirement already satisfied: pydantic in ./.venv/lib/python3.13/site-packages (2.11.7)
Requirement already satisfied: pydantic_settings in ./.venv/lib/python3.13/site-packages (2.10.1)
Requirement already satisfied: linetimer in ./.venv/lib/python3.13/site-packages (0.1.5)
Requirement already satisfied: annotated-types>=0.6.0 in ./.venv/lib/python3.13/site-packages (from pydantic) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in ./.venv/lib/python3.13/site-packages (from pydantic) (2.33.2)
Requirement already satisfied: typing-extensions>=4.12.2 in ./.venv/lib/python3.13/site-packages (from pydantic) (4.14.0)
Requirement already satisfied: typing-inspection>=0.4.0 in ./.venv/lib/python3.13/site-packages (from pydantic) (0.4.1)
Requirement already satisfied: python-dotenv>=0.21.0 in ./.venv/lib/python3.13/site-packages (from pydantic_settings) (1.1.1)
make -C tpch-dbgen dbgen
make[1]: Entering directory '/par/fire/tpch-dbgen'
make[1]: 'dbgen' is up to date.
make[1]: Leaving directory '/par/fire/tpch-dbgen'
cd tpch-dbgen && ./dbgen -vf -s 0.1 && cd ..
TPC-H Population Generator (Version 2.17.2)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for suppliers table/
Preloading text ... 100%
done.
Generating data for customers tabledone.
Generating data for orders/lineitem tablesdone.
Generating data for part/partsupplier tablesdone.
Generating data for nation tabledone.
Generating data for region tabledone.
mkdir -p "data/tables_pyarrow/scale-0.1"
mv tpch-dbgen/*.tbl data/tables_pyarrow/scale-0.1/
.venv/bin/python -m scripts.prepare_data_pyarrow
Processing table: customer
Processing table: lineitem
Processing table: nation
Processing table: orders
Processing table: part
Processing table: partsupp
Processing table: region
Processing table: supplier
rm -rf data/tables_pyarrow/scale-0.1/*.tbl
PATH_TABLES=data/tables_pyarrow .venv-fireducks/bin/python -m queries.fireducks
{"scale_factor":0.1,"large_string_comment":false,"paths":{"answers":"data/answers","tables":"data/tables_pyarrow","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"skip","log_timings":true,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"polars_new_streaming":false,"polars_gpu":false,"polars_gpu_device":0,"use_rmm_mr":"cuda-async","modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":false},"dataset_base_dir":"data/tables_pyarrow/scale-0.1"}
Code block 'Run fireducks query 1' took: 0.20121 s
Code block 'Run fireducks query 2' took: 0.52730 s
Code block 'Run fireducks query 3' took: 0.15594 s
Code block 'Run fireducks query 4' took: 0.15536 s
Code block 'Run fireducks query 5' took: 0.23419 s
Code block 'Run fireducks query 6' took: 0.11777 s
Code block 'Run fireducks query 7' took: 0.27936 s
Code block 'Run fireducks query 8' took: 0.22832 s
Code block 'Run fireducks query 9' took: 0.18384 s
Code block 'Run fireducks query 10' took: 0.33037 s
Code block 'Run fireducks query 11' took: 0.16605 s
Code block 'Run fireducks query 12' took: 0.16841 s
Code block 'Run fireducks query 13' took: 0.14314 s
Code block 'Run fireducks query 14' took: 0.13404 s
Code block 'Run fireducks query 15' took: 0.14402 s
Code block 'Run fireducks query 16' took: 0.20629 s
Code block 'Run fireducks query 17' took: 0.15346 s
Code block 'Run fireducks query 18' took: 0.19930 s
Code block 'Run fireducks query 19' took: 0.20121 s
Code block 'Run fireducks query 20' took: 0.27538 s
Code block 'Run fireducks query 21' took: 0.30119 s
Code block 'Run fireducks query 22' took: 0.22134 s
Code block 'Overall execution of ALL fireducks queries' took: 130.80006 s

如果要和其他工具的性能比较,queries目录下有duckdb、polars等的脚本,调用方法如下:

PATH_TABLES=data/tables_pyarrow SCALE_FACTOR=0.1 .venv/bin/python -m queries.duckdb
Code block 'Run duckdb query 1' took: 2.36939 s
...
Code block 'Overall execution of ALL duckdb queries' took: 88.98257 s

PATH_TABLES=data/tables_pyarrow SCALE_FACTOR=0.1 .venv/bin/python -m queries.polars
Code block 'Run polars query 1' took: 0.34880 s
...
Code block 'Overall execution of ALL polars queries' took: 61.85478 s

fireducks的这个脚本是从polars那里fork的,不知做了什么加工,单个查询duckdb比polars和fireducks慢很多,相差10倍,难以置信。直接用如下语句测试,明明不到1秒

import duckdb
q1="""
SELECT
    l_returnflag,
    l_linestatus,
    SUM(l_quantity) AS sum_qty,
    SUM(l_extendedprice) AS sum_base_price,
    SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
    SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    AVG(l_quantity) AS avg_qty,
    AVG(l_extendedprice) AS avg_price,
    AVG(l_discount) AS avg_disc,
    COUNT(*) AS count_order
FROM
    'data/tables_pyarrow/scale-0.1/lineitem.parquet' l
WHERE
    l_shipdate <= CAST('1998-09-02' AS date)
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;
"""
import time
t=time.time();df = duckdb.sql(q1);df.show();print(time.time()-t)
 .venv/bin/python /par/duckdbq1.py
┌──────────────┬──────────────┬─────────┬────────────────────┬───────────────────┬────────────────────┬────────────────────┬───────────────────┬─────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │   sum_base_price   │  sum_disc_price   │     sum_charge     │      avg_qty       │     avg_price     │      avg_disc       │ count_order │
│   varchar    │   varchar    │ int128  │       double       │      double       │       double       │       double       │      double       │       double        │    int64    │
├──────────────┼──────────────┼─────────┼────────────────────┼───────────────────┼────────────────────┼────────────────────┼───────────────────┼─────────────────────┼─────────────┤
│ A            │ F            │ 37742005320753880.689985054096266.6828355256751331.44926725.53758711685499736002.1238290140.05014459706345448147790 │
│ N            │ F            │   95257133737795.83999994127132372.6512132286291.2294447325.3006640106241735521.326916334650.04939442231075733765 │
│ N            │ O            │ 745929710512270008.899929986238338.38476610385578376.58547625.54553767123287536000.924688013420.05009595890418491292000 │
│ R            │ F            │ 37855235337950526.46987155071818532.9421015274405503.04936625.525943857425135994.029214030060.04998927856189752148301 │
└──────────────┴──────────────┴─────────┴────────────────────┴───────────────────┴────────────────────┴────────────────────┴───────────────────┴─────────────────────┴─────────────┘

0.6631364822387695
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值