在databend python模块中测试tpch

前面测试了数种单机版分析型数据库,
databend是rust编写的分析型数据库,它不支持访问指定parquet文件,而要用stage访问系统生成的文件。如下所示,有些繁琐。

create stage lake;	 	 	 
copy into @lake from(select i::int i from range(1,1000000000)t(i))file_format=(type=parquet); 
SELECT avg(i) FROM @lake (pattern => '.*parquet') group by round(log10(i)) ;

所以改用databend python模块来测试。它可以把指定的parquet文件注册成一个别名,以后都用此别名来访问。
编写/par/config2.toml.txt,内容如下:

[meta]
embedded_dir = "/par/"

# Storage config.
[storage]
# fs | s3 | azblob | obs | oss
type = "fs"
allow_insecure = true

[storage.fs]
data_path = "/par/"
allow_insecure = true

pip install databend安装,然后编写python脚本q1.py, 如下:(注册部分只能运行一次,否则报错RuntimeError: DataFrame collect error: ViewAlreadyExists. Code: 2306, Text = 'lineitem' as view Already Exists.

import databend
databend.init_service( config = "/par/config2.toml.txt" )
from databend import SessionContext
ctx = SessionContext()
ctx.register_parquet("lineitem", "/par/fire/data/tables_pyarrow/scale-10.0/", pattern = "lineitem.parquet") #only once
#df = ctx.sql("select count(*) from lineitem")
#df.show()


q1="""
SELECT
    l_returnflag,
    l_linestatus,
    SUM(l_quantity) AS sum_qty,
    SUM(l_extendedprice) AS sum_base_price,
    SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
    SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    AVG(l_quantity) AS avg_qty,
    AVG(l_extendedprice) AS avg_price,
    AVG(l_discount) AS avg_disc,
    COUNT(*) AS count_order
FROM
    lineitem
WHERE
    l_shipdate <= CAST('1998-09-02' AS date)
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;
"""
import time
t=time.time();df = ctx.sql(q1);df.show();print(time.time()-t)


q3="""
select 
l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from 
customer  c, 
orders  o, 
lineitem  l
where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date'1995-03-15' 
and l_shipdate > date'1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
"""
ctx.register_parquet("customer", "/par/fire/data/tables_pyarrow/scale-10.0/", pattern = "customer.parquet") #only once
ctx.register_parquet("orders", "/par/fire/data/tables_pyarrow/scale-10.0/", pattern = "orders.parquet") #only once


t=time.time();df = ctx.sql(q3);df.show();print(time.time()-t)

然后这样执行

 RUST_BACKTRACE=1 python /par/q1.py
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ l_returnflag │ l_linestatus │       sum_qty       │    sum_base_price   │    sum_disc_price   │      sum_charge     │       avg_qty       │      avg_price      │       avg_disc      │ count_order │
│    String    │    String    │ Decimal(18, 2) NULL │ Decimal(18, 2) NULL │ Decimal(38, 4) NULL │ Decimal(38, 6) NULL │ Decimal(24, 8) NULL │ Decimal(24, 8) NULL │ Decimal(24, 8) NULL │    UInt64   │
├──────────────┼──────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────┤
│ 'A''F'113243256.00169777698133.48161288698452.0115167740893756.43934925.5047495738237.488276350.050001614440085 │
│ 'N''F'2953690.004424757845.204203689870.81004372096041.07241525.5392423938258.911097850.04997985115653 │
│ 'N''O'222980543.00334387558404.46317670374370.3902330383821608.06234525.5000263038240.518297750.049990518744326 │
│ 'R''F'113357470.00170002054868.07161505271120.8840167964089577.05231425.5110068238258.824767940.049988784443473 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
0.8978915214538574
┌──────────────────────────────────────────────────────────────────┐
│ l_orderkey │       revenue       │  o_orderdate │ o_shippriority │
│    Int64   │ Decimal(38, 4) NULL │     Date     │      Int32     │
├────────────┼─────────────────────┼──────────────┼────────────────┤
│    2113221453119.3884'1995-02-19'0 │
│    9210272444593.0049'1995-02-18'0 │
│    8443557435105.0808'1995-03-12'0 │
│   15619200431264.0142'1995-02-23'0 │
│   12255266424725.6722'1995-02-28'0 │
│   11568359417991.5783'1995-03-13'0 │
│    8823430411455.5585'1995-03-11'0 │
│   12096992404105.0420'1995-03-12'0 │
│    8058883403450.3528'1995-03-05'0 │
│    7862021401060.8511'1995-03-09'0 │
└──────────────────────────────────────────────────────────────────┘
1.282097339630127

我的数据实际上是sf=3,lineitem约1800万行。

┌──────────┐
│ COUNT(*) │
│  UInt64  │
├──────────┤
│ 17996609 │
└──────────┘

这个速度和前几种数据库处于同一水平。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值