摘要:
duckdb的物理计划的执行的架构设计很出色, 每个物理计划的算子可以多线程并行的处理.
处于核心地位的便是Pipeline和TaskSchedule, 可以将其理解为多线程模型中的生产者与消费者模式, 但是和物理计划执行组合起来, 就能体会到这样设计的简洁和优雅.
参考:
https://www.youtube.com/watch?v=MA0OsvYFGrc
https://dsdsd.da.cwi.nl/slides/dsdsd-duckdb-push-based-execution.pdf
[DuckDB] Push-Based Execution Model - 知乎
duckdb中操作符的执行
在DuckDB中,操作符(Operator)是用来执行实际的查询计划的组件。当一个查询被编译并生成一个查询计划时,该计划会被转换成一个由一系列操作符组成的有向无环图(DAG),每个操作符都表示一种特定的操作,例如扫描表、应用筛选条件、排序等,操作符之间通过数据流相连。
操作符的执行是通过TaskScheduler(任务调度器)来实现的。TaskScheduler是DuckDB中的一个并发执行引擎,它负责管理一组可并行执行的任务(Task),并利用系统上的多个CPU核心来执行这些任务。当一个查询被提交执行时,TaskScheduler会根据查询计划中的操作符建立一个任务调度图,并根据优化器的分析确定每个任务的执行顺序和并行度。然后,TaskScheduler会分配任务给空闲的CPU核心,并等待它们完成。
在DuckDB的代码实现中,Operator和TaskScheduler的交互主要是通过虚函数(Virtual Function)实现的。Operator是一个抽象类,定义了一组纯虚函数(Pure Virtual Function),例如Execute等。这些函数是具体操作符子类要实现的,用于执行和管理操作符的状态。TaskScheduler则提供了一组接口(Interface)和抽象类,例如Task和TaskScheduler等,用于管理任务调度和并发执行。具体执行过程中,TaskScheduler会根据操作符的类型和属性,创建相应的Task,并为它们设置相关的执行参数(例如并行度和任务优先级),然后调用对应操作符的Execute函数,将控制权转移到执行函数内部,让操作符执行具体的操作逻辑。如果操作符需要读写资源,TaskScheduler会根据具体的资源访问规则(例如基于锁或信号量等的方式)控制并发访问。
Pipeline
核心处理
PipelineBuildState::AddPipelineOperator
#0 duckdb::PipelineBuildState::AddPipelineOperator (this=0x7ffc42d9a060, pipeline=..., op=0x61800001d880) at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/parallel/pipeline.cpp:251
#1 0x0000000005dd78c5 in duckdb::PhysicalJoin::BuildJoinPipelines (current=..., meta_pipeline=..., op=...)
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/execution/operator/join/physical_join.cpp:35
#2 0x0000000005dd88f2 in duckdb::PhysicalJoin::BuildPipelines (this=0x61800001d880, current=..., meta_pipeline=...)
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/execution/operator/join/physical_join.cpp:84
#3 0x00000000037ba2dd in duckdb::MetaPipeline::Build (this=0x61100017a100, op=0x61800001d880) at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/parallel/meta_pipeline.cpp:74
#4 0x0000000005d54984 in duckdb::PhysicalResultCollector::BuildPipelines (this=0x611000179e80, current=..., meta_pipeline=...)
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/execution/operator/helper/physical_result_collector.cpp:56
#5 0x00000000037ba2dd in duckdb::MetaPipeline::Build (this=0x611000179fd0, op=0x611000179e80) at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/parallel/meta_pipeline.cpp:74
#6 0x00000000037c7852 in duckdb::Executor::InitializeInternal (this=0x612000099940, plan=0x611000179e80) at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/parallel/executor.cpp:306
#7 0x00000000037c6f7a in duckdb::Executor::Initialize (this=0x612000099940, physical_plan=std::unique_ptr<duckdb::PhysicalOperator> = {...})
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/parallel/executor.cpp:284
#8 0x00000000033c3c48 in duckdb::ClientContext::PendingPreparedStatement (this=0x615000011d90, lock=...,
statement_p=std::shared_ptr<duckdb::PreparedStatementData> (use count 2, weak count 0) = {...}, parameters=...)
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/main/client_context.cpp:419
#9 0x00000000033ceab4 in duckdb::ClientContext::PendingStatementOrPreparedStatement (this=0x615000011d90, lock=...,
query="SELECT * FROM c WHERE EXISTS ( SELECT 1 FROM d WHERE c.c1 = d.d1 ) ;",
statement=std::unique_ptr<duckdb::SQLStatement> = {...}, prepared=std::shared_ptr<duckdb::PreparedStatementData> (use count 2, weak count 0) = {...}, parameters=...)
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/main/client_context.cpp:738
#10 0x00000000033cd45c in duckdb::ClientContext::PendingStatementOrPreparedStatementInternal (this=0x615000011d90, lock=...,
query="SELECT * FROM c WHERE EXISTS ( SELECT 1 FROM d WHERE c.c1 = d.d1 ) ;",
statement=std::unique_ptr<duckdb::SQLStatement> = {...}, prepared=std::shared_ptr<duckdb::PreparedStatementData> (use count 2, weak count 0) = {...}, parameters=...)
at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/main/client_context.cpp:695
#11 0x00000000033ca403 in duckdb::ClientContext::PendingQueryPreparedInternal (this=0x615000011d90, lock=...,
query="SELECT * FROM c WHERE EXISTS ( SELECT 1 FROM d WHERE c.c1 = d.d1 ) ;",
prepared=std::shared_ptr<duckdb::PreparedStatementData> (use count 2, weak count 0) = {...}, parameters=...) at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/main/client_context.cpp:577
#12 0x00000000033ca9ae in duckdb::ClientContext::PendingQuery (this=0x615000011d90,
query="SELECT * FROM c WHERE EXISTS ( SELECT 1 FROM d WHERE c.c1 = d.d1 ) ;",
prepared=std::shared_ptr<duckdb::PreparedStatementData> (use count 2, weak count 0) = {...}, parameters=...) at /root/work/duckdb-dev/trunk/duckdb-0.7.1/src/main/client_context.cpp:584
#13 0x000000000340e3d4 in duckdb::PreparedStatement::PendingQuery (this=0x611000179c00, values=std::vector of length 0, capacity 0, allow_stream_result=false)
at /root/work/duckdb-dev/trunk/duc