三种DuckDB电子表格插件的union all查询性能对比

我选取了最稳定、兼容性最好的三种:官方excel对应函数read_xlsx()、官方spatial对应函数st_read()、rusty_sheet对应函数read_sheet。
1.建立两个包含前50万和后54万的xlsx文件,用于比较。利用官方excel的copy()to进行。

D copy (from v1 order by l_orderkey,l_partkey,l_suppkey limit 500000)to '50_1.xlsx'WITH (HEADER 1);
Run Time (s): real 22.406 user 24.578125 sys 0.609375
D select count(*) from read_sheet('50_1.xlsx');
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│    500000    │
└──────────────┘
Run Time (s): real 6.725 user 6.359375 sys 0.343750
D copy (from v1 order by l_orderkey,l_partkey,l_suppkey offset 500000)to '50_2.xlsx';
Run Time (s): real 22.259 user 24.625000 sys 0.484375
D select count(*) from read_sheet('50_2.xlsx',header=0);
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│    548575    │
└──────────────┘
Run Time (s): real 7.590 user 7.296875 sys 0.250000

2.分别比较不同读取函数的单文件、双文件union all、更改列数、识别错误的st_read()更改数据类型的执行时间。

D select sum(l_orderkey) from(from read_sheet('50_1.xlsx') union all from read_sheet('50_2.xlsx',header=0));
┌──────────────────┐
│ sum(l_orderkey)  │
│      int128      │
├──────────────────┤
│   549457354624   │
│ (549.46 billion) │
└──────────────────┘
Run Time (s): real 13.159 user 13.875000 sys 0.531250
D select sum(l_orderkey) from read_sheet('exli2.xlsx');
┌──────────────────┐
│ sum(l_orderkey)  │
│      int128      │
├──────────────────┤
│   549457354624   │
│ (549.46 billion) │
└──────────────────┘
Run Time (s): real 15.224 user 14.453125 sys 0.609375

D select sum(l_orderkey) from(from st_read('50_1.xlsx') union all from st_read('50_2.xlsx'));
┌──────────────────┐
│ sum(l_orderkey)  │
│      int128      │
├──────────────────┤
│   549457354624   │
│ (549.46 billion) │
└──────────────────┘
Run Time (s): real 13.869 user 17.875000 sys 0.343750
D select sum(l_orderkey) from st_read('exli2.xlsx' );
┌──────────────────┐
│ sum(l_orderkey)  │
│      int128      │
├──────────────────┤
│   549457354624   │
│ (549.46 billion) │
└──────────────────┘
Run Time (s): real 19.012 user 18.750000 sys 0.250000


D select sum(l_orderkey) from read_xlsx('exli2.xlsx' );
┌──────────────────┐
│ sum(l_orderkey)  │
│      double      │
├──────────────────┤
│  549457354624.0  │
│ (549.46 billion) │
└──────────────────┘
Run Time (s): real 7.982 user 7.593750 sys 0.375000
D select sum(l_orderkey) from(from read_xlsx('50_1.xlsx') union all from read_xlsx('50_2.xlsx'));
┌──────────────────┐
│ sum(l_orderkey)  │
│      double      │
├──────────────────┤
│  549457354624.0  │
│ (549.46 billion) │
└──────────────────┘
Run Time (s): real 7.190 user 6.656250 sys 0.515625
D select sum(l_orderkey::int64) from(from read_xlsx('50_1.xlsx') union all from read_xlsx('50_2.xlsx'));
┌─────────────────────────────────┐
│ sum(CAST(l_orderkey AS BIGINT)) │
│             int128              │
├─────────────────────────────────┤
│          549457354624           │
│        (549.46 billion)         │
└─────────────────────────────────┘
Run Time (s): real 4.234 user 6.906250 sys 0.421875
D select sum(l_orderkey::int64) from read_xlsx('exli2.xlsx' );
┌─────────────────────────────────┐
│ sum(CAST(l_orderkey AS BIGINT)) │
│             int128              │
├─────────────────────────────────┤
│          549457354624           │
│        (549.46 billion)         │
└─────────────────────────────────┘
Run Time (s): real 7.900 user 7.515625 sys 0.375000

D select sum(l_orderkey::int64),sum(l_partkey::int64),sum(l_suppkey::int64) from(from read_xlsx('50_1.xlsx') union all from read_xlsx('50_2.xlsx'));
┌─────────────────────────────────┬────────────────────────────────┬────────────────────────────────┐
│ sum(CAST(l_orderkey AS BIGINT))sum(CAST(l_partkey AS BIGINT))sum(CAST(l_suppkey AS BIGINT)) │
│             int128              │             int128             │             int128             │
├─────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│          5494573546241048673436545246691219           │
│        (549.46 billion)(104.87 billion)(5.25 billion)         │
└─────────────────────────────────┴────────────────────────────────┴────────────────────────────────┘
Run Time (s): real 4.183 user 6.906250 sys 0.406250
D select sum(l_orderkey),sum(l_partkey),sum(l_suppkey) from(from read_xlsx('50_1.xlsx') union all from read_xlsx('50_2.xlsx'));
┌──────────────────┬──────────────────┬────────────────┐
│ sum(l_orderkey)sum(l_partkey)sum(l_suppkey) │
│      doubledoubledouble     │
├──────────────────┼──────────────────┼────────────────┤
│  549457354624.0104867343654.05246691219.0  │
│ (549.46 billion)(104.87 billion)(5.25 billion) │
└──────────────────┴──────────────────┴────────────────┘
Run Time (s): real 7.217 user 6.750000 sys 0.421875
D select sum(l_orderkey),sum(l_partkey),sum(l_suppkey) from(from st_read('50_1.xlsx') union all from st_read('50_2.xlsx'));
┌──────────────────┬──────────────────┬────────────────┐
│ sum(l_orderkey)sum(l_partkey)sum(l_suppkey) │
│      int128      │      int128      │     int128     │
├──────────────────┼──────────────────┼────────────────┤
│   5494573546241048673436545246691219   │
│ (549.46 billion)(104.87 billion)(5.25 billion) │
└──────────────────┴──────────────────┴────────────────┘
Run Time (s): real 13.719 user 17.859375 sys 0.234375
D select sum(l_orderkey),sum(l_partkey),sum(l_suppkey) from(from read_sheet('50_1.xlsx') union all from read_sheet('50_2.xlsx'));
┌──────────────────┬──────────────────┬────────────────┐
│ sum(l_orderkey)sum(l_partkey)sum(l_suppkey) │
│      int128      │      int128      │     int128     │
├──────────────────┼──────────────────┼────────────────┤
│   5494568549411048672040405246689091   │
│ (549.46 billion)(104.87 billion)(5.25 billion) │
└──────────────────┴──────────────────┴────────────────┘
Run Time (s): real 13.495 user 14.000000 sys 0.812500

想从执行计划看不同插件并行度差异的原因,完全没有这方面的信息

D explain analyze select sum(l_orderkey::int64),sum(l_partkey::int64),sum(l_suppkey::int64) from(from read_xlsx('50_1.xlsx') union all from read_xlsx('50_2.xlsx'));
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
explain analyze select sum(l_orderkey::int64),sum(l_partkey::int64),sum(l_suppkey::int64) from(from read_xlsx('50_1.xlsx') union all from read_xlsx('50_2.xlsx'));
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││               Total Time: 4.21s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│          sum(#0)          │sum(#1)          │sum(#2)          │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│ CAST(l_orderkey AS BIGINT)│
│ CAST(l_partkey AS BIGINT) │
│ CAST(l_suppkey AS BIGINT) │
│                           │
│        1048575 Rows       │
│          (0.01s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           UNION           │
│    ────────────────────   │
│           0 Rows          ├──────────────┐
│          (0.00s)          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│         l_orderkey        ││             A1            │
│         l_partkey         ││             B1            │
│         l_suppkey         ││             C1            │
│                           ││                           │
│        500000 Rows        ││        548575 Rows        │
│          (0.00s)          ││          (0.00s)          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         TABLE_SCAN        ││         TABLE_SCAN        │
│    ────────────────────   ││    ────────────────────   │
│    Function: READ_XLSX    ││    Function: READ_XLSX    │
│                           ││                           │
│        500000 Rows        ││        548575 Rows        │
│          (3.15s)          ││          (3.49s)          │
└───────────────────────────┘└───────────────────────────┘
Run Time (s): real 4.214 user 7.515625 sys 0.500000
D explain analyze select sum(l_orderkey),sum(l_partkey),sum(l_suppkey) from(from read_sheet('50_1.xlsx') union all fromread_sheet('50_2.xlsx'));
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
explain analyze select sum(l_orderkey),sum(l_partkey),sum(l_suppkey) from(from read_sheet('50_1.xlsx') union all from read_sheet('50_2.xlsx'));
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 13.04s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│          sum(#0)          │sum(#1)          │sum(#2)          │
│                           │
│           1 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│         l_orderkey        │
│         l_partkey         │
│         l_suppkey         │
│                           │
│        1048574 Rows       │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           UNION           │
│    ────────────────────   │
│           0 Rows          ├──────────────┐
│          (0.00s)          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│         l_orderkey        ││           499683          │
│         l_partkey         ││           139614          │
│         l_suppkey         ││            2128           │
│                           ││                           │
│        500000 Rows        ││        548574 Rows        │
│          (0.00s)          ││          (0.00s)          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         TABLE_SCAN        ││         TABLE_SCAN        │
│    ────────────────────   ││    ────────────────────   │
│    Function: READ_SHEET   ││    Function: READ_SHEET   │
│                           ││                           │
│        500000 Rows        ││        548574 Rows        │
│          (1.17s)          ││          (1.26s)          │
└───────────────────────────┘└───────────────────────────┘
Run Time (s): real 13.212 user 13.968750 sys 0.500000
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值