DuckDB python 模块各种csv导出方式对比

数据源tpch sf=0.1数据库 lineitem表
按速度从快到慢排序如下

import duckdb
import pandas
import polars
import pyarrow
from pyarrow import csv

import time
con = duckdb.connect("tpch_duckdb")

#sql copy to
t=time.time();con.sql("copy lineitem to 'copyto.csv'");print(time.time()-t)
0.7691347599029541

# pyarrow.csv.write_csv
t=time.time();ar=con.sql("from lineitem").arrow();print(time.time()-t)
0.28513407707214355

t=time.time();pyarrow.csv.write_csv(ar,"arout.csv");print(time.time()-t)
0.591275691986084

#sql write_csv
t=time.time();con.sql("from lineitem").write_csv("out.csv");print(time.time()-t)
0.9410018920898438


# polars write_csv
t=time.time();pl=con.sql("from lineitem").pl();print(time.time()-t)
0.5547764301300049

t=time.time();pl.write_csv("pl.csv");print(time.time()-t)
0.6685678958892822

# pandas to_csv
t=time.time();df=con.sql("from lineitem").df();print(time.time()-t)
0.8767569065093994

t=time.time();df.to_csv("df.csv",index=None);print(time.time()-t)
6.194610118865967

以为快的软件基本上都用了多线程。但设置为单线程反而更快

con = duckdb.connect("tpch_duckdb",config = {'threads': 1})
t=time.time();con.sql("copy lineitem to 'copyto.csv'");print(time.time()-t)
0.7604503631591797
t=time.time();con.sql("from lineitem").write_csv("out.csv");print(time.time()-t)
0.6541898250579834
t=time.time();ar=con.sql("from lineitem").arrow();print(time.time()-t)
0.14452648162841797
t=time.time();pyarrow.csv.write_csv(ar,"arout.csv");print(time.time()-t)
0.5979902744293213
t=time.time();pl=con.sql("from lineitem").pl();print(time.time()-t)
0.1778249740600586
t=time.time();pl.write_csv("pl.csv");print(time.time()-t)
0.37970709800720215
t=time.time();df=con.sql("from lineitem").df();print(time.time()-t)
0.43950891494750977
t=time.time();df.to_csv("df.csv",index=None);print(time.time()-t)
6.099819183349609

补记:
fireducks支持多线程,并兼容pandas,但没有arm64 Linux版,所以在amd64上测试,并与pandas比较

import duckdb
con = duckdb.connect("/par/tpch_duckdb")

t=time.time();con.sql("copy lineitem to 'copyto.csv'");print(time.time()-t)
0.5453505516052246

import fireducks.pandas
t=time.time();fd=fireducks.pandas.read_csv('copyto.csv');print(time.time()-t)
0.014633893966674805

t=time.time();fd.to_csv('fireduck.csv');print(time.time()-t)
1.0539100170135498


import pandas
t=time.time();pd=pandas.read_csv('copyto.csv');print(time.time()-t)
2.0135209560394287

t=time.time();pd.to_csv('pandas.csv');print(time.time()-t)
4.81012487411499
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值