首先,从https://github.1git.de/fireducks-dev/polars-tpch下载源代码包,将其解压缩到/par/fire目录。
然后进入此目录,运行
SCALE_FACTOR=0.1 ./run-fireducks.sh,脚本会首先安装所需的包,编译tpch的数据生成器,然后按照sf=0.1生成tbl文件,再转化为parquet格式,最后执行。
如下所示:
root@DESKTOP-59T6U68:/par/fire# SCALE_FACTOR=0.1 ./run-fireducks.sh
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Requirement already satisfied: pyarrow in ./.venv/lib/python3.13/site-packages (20.0.0)
Requirement already satisfied: pydantic in ./.venv/lib/python3.13/site-packages (2.11.7)
Requirement already satisfied: pydantic_settings in ./.venv/lib/python3.13/site-packages (2.10.1)
Requirement already satisfied: linetimer in ./.venv/lib/python3.13/site-packages (0.1.5)
Requirement already satisfied: annotated-types>=0.6.0 in ./.venv/lib/python3.13/site-packages (from pydantic) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in ./.venv/lib/python3.13/site-packages (from pydantic) (2.33.2)
Requirement already satisfied: typing-extensions>=4.12.2 in ./.venv/lib/python3.13/site-packages (from pydantic) (4.14.0)
Requirement already satisfied: typing-inspection>=0.4.0 in ./.venv/lib/python3.13/site-packages (from pydantic) (0.4.1)
Requirement already satisfied: python-dotenv>=0.21.0 in ./.venv/lib/python3.13/site-packages (from pydantic_settings) (1.1.1)
make -C tpch-dbgen dbgen
make[1]: Entering directory '/par/fire/tpch-dbgen'
make[1]: 'dbgen' is up to date.
make[1]: Leaving directory '/par/fire/tpch-dbgen'
cd tpch-dbgen && ./dbgen -vf -s 0.1 && cd ..
TPC-H Population Generator (Version 2.17.2)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for suppliers table/
Preloading text ... 100%
done.
Generating data for customers tabledone.
Generating data for orders/lineitem tablesdone.
Generating data for part/partsupplier tablesdone.
Generating data for nation tabledone.
Generating data for region tabledone.
mkdir -p "data/tables_pyarrow/scale-0.1"
mv tpch-dbgen/*.tbl data/tables_pyarrow/scale-0.1/
.venv/bin/python -m scripts.prepare_data_pyarrow
Processing table: customer
Processing table: lineitem
Processing table: nation
Processing table: orders
Processing table: part
Processing table: partsupp
Processing table: region
Processing table: supplier
rm -rf data/tables_pyarrow/scale-0.1/*.tbl
PATH_TABLES=data/tables_pyarrow .venv-fireducks/bin/python -m queries.fireducks
{"scale_factor":0.1,"large_string_comment":false,"paths":{"answers":"data/answers","tables":"data/tables_pyarrow","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"skip","log_timings":true,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"polars_new_streaming":false,"polars_gpu":false,"polars_gpu_device":0,"use_rmm_mr":"cuda-async","modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":false},"dataset_base_dir":"data/tables_pyarrow/scale-0.1"}
Code block 'Run fireducks query 1' took: 0.20121 s
Code block 'Run fireducks query 2' took: 0.52730 s
Code block 'Run fireducks query 3' took: 0.15594 s
Code block 'Run fireducks query 4' took: 0.15536 s
Code block 'Run fireducks query 5' took: 0.23419 s
Code block 'Run fireducks query 6' took: 0.11777 s
Code block 'Run fireducks query 7' took: 0.27936 s
Code block 'Run fireducks query 8' took: 0.22832 s
Code block 'Run fireducks query 9' took: 0.18384 s
Code block 'Run fireducks query 10' took: 0.33037 s
Code block 'Run fireducks query 11' took: 0.16605 s
Code block 'Run fireducks query 12' took: 0.16841 s
Code block 'Run fireducks query 13' took: 0.14314 s
Code block 'Run fireducks query 14' took: 0.13404 s
Code block 'Run fireducks query 15' took: 0.14402 s
Code block 'Run fireducks query 16' took: 0.20629 s
Code block 'Run fireducks query 17' took: 0.15346 s
Code block 'Run fireducks query 18' took: 0.19930 s
Code block 'Run fireducks query 19' took: 0.20121 s
Code block 'Run fireducks query 20' took: 0.27538 s
Code block 'Run fireducks query 21' took: 0.30119 s
Code block 'Run fireducks query 22' took: 0.22134 s
Code block 'Overall execution of ALL fireducks queries' took: 130.80006 s
如果要和其他工具的性能比较,queries目录下有duckdb、polars等的脚本,调用方法如下:
PATH_TABLES=data/tables_pyarrow SCALE_FACTOR=0.1 .venv/bin/python -m queries.duckdb
Code block 'Run duckdb query 1' took: 2.36939 s
...
Code block 'Overall execution of ALL duckdb queries' took: 88.98257 s
PATH_TABLES=data/tables_pyarrow SCALE_FACTOR=0.1 .venv/bin/python -m queries.polars
Code block 'Run polars query 1' took: 0.34880 s
...
Code block 'Overall execution of ALL polars queries' took: 61.85478 s
fireducks的这个脚本是从polars那里fork的,不知做了什么加工,单个查询duckdb比polars和fireducks慢很多,相差10倍,难以置信。直接用如下语句测试,明明不到1秒
import duckdb
q1="""
SELECT
l_returnflag,
l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM
'data/tables_pyarrow/scale-0.1/lineitem.parquet' l
WHERE
l_shipdate <= CAST('1998-09-02' AS date)
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
"""
import time
t=time.time();df = duckdb.sql(q1);df.show();print(time.time()-t)
.venv/bin/python /par/duckdbq1.py
┌──────────────┬──────────────┬─────────┬────────────────────┬───────────────────┬────────────────────┬────────────────────┬───────────────────┬─────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │
│ varchar │ varchar │ int128 │ double │ double │ double │ double │ double │ double │ int64 │
├──────────────┼──────────────┼─────────┼────────────────────┼───────────────────┼────────────────────┼────────────────────┼───────────────────┼─────────────────────┼─────────────┤
│ A │ F │ 3774200 │ 5320753880.68998 │ 5054096266.682835 │ 5256751331.449267 │ 25.537587116854997 │ 36002.123829014 │ 0.05014459706345448 │ 147790 │
│ N │ F │ 95257 │ 133737795.83999994 │ 127132372.6512 │ 132286291.22944473 │ 25.30066401062417 │ 35521.32691633465 │ 0.0493944223107573 │ 3765 │
│ N │ O │ 7459297 │ 10512270008.89992 │ 9986238338.384766 │ 10385578376.585476 │ 25.545537671232875 │ 36000.92468801342 │ 0.05009595890418491 │ 292000 │
│ R │ F │ 3785523 │ 5337950526.4698715 │ 5071818532.942101 │ 5274405503.049366 │ 25.5259438574251 │ 35994.02921403006 │ 0.04998927856189752 │ 148301 │
└──────────────┴──────────────┴─────────┴────────────────────┴───────────────────┴────────────────────┴────────────────────┴───────────────────┴─────────────────────┴─────────────┘
0.6631364822387695