这个project的目标是在原来的基础上增加语句执行,实现执行器中的获取查询计划节点并且执行,以及实现执行器中的sequential scans,inserts,hash join, aggregations等功能。执行器使用volcano model,每一个查询计划执行器都实现Next函数,Next会返回一个元组或者返回空。
TASK #1 - CREATING A CATALOG TABLE
数据库需要维护一个内部的catalog去记录数据库的元信息,例如用用于记录数据库中有什么table以及这些table的在哪里。
在task1中只需实现src/include/catalog/simple_catalog.h中的CreateTable,GetTable(const std::string &table_name), 和GetTable(table_oid_t table_oid)这三个函数,只需要查看SimpleCatalog和TableMetaData这两个类的构造函数然后传入正确的参数即可。在SimpleCatalog中通过names_来记录table name和table_oid的映射,通过tables_来记录table_oid和TableMetadata的映射,在GetTable中如果找不到对应的table则抛出std::out_of_range异常。
TASK #2 - EXECUTORS
在Task 2中将会实现sequential scans、inserts、hash joins和aggregations的executors,对于每一种算子都要实现Init函数和Next函数,其中Init函数用于初始化算法(如获取对应的表),Next每次调用都会返回一个tuple。
算子对应的实现位于以下文件:
src/include/execution/executors/seq_scan_executor.h
src/include/execution/executors/insert_executor.h
src/include/execution/executors/hash_join_executor.h
src/include/execution/executors/aggregation_executor.h
如果想要了解executors是如何在运行时被创建的,参考文件
src/include/execution/executor_factory.h中的ExecutorFactory类。每个executor都有成员ExecutorContext(src/include/execution/executor_context.h),其中维护了query的相关信息。
ExecutorFactory中的成员函数为CreateExecutor,其中CreateExecutor的声明如下:
std::unique_ptr<AbstractExecutor> CreateExecutor(ExecutorContext *exec_ctx, const AbstractPlanNode *plan);
CreateExecutor将计划节点转换为执行器节点,在CreateExecutor函数中的主要内容是根据计划节点中的类型(SeqScan、Insert、HashJoin、Aggregation)创建对应的执行器节点的智能指针。
ExecutorContext中维护了transation、catalog、buffer pool manager:
class ExecutorContext{
Transaction *transaction_;
SimpleCatalog *catalog_;
BufferPoolManager *bpm_;
};
Sequential Scans
Sequential scans遍历整个表并且一次性返回表所有的tuples,sequential scan在SeqScanPlanNode中被指定,计划节点指定要遍历哪个表。在计划节点中可能还包含了predicate,如果tuple不满足predicate,那么跳过该tuple。
void Init() override {
iter_ = exec_ctx_->GetCatalog()->GetTable(plan_->GetTableOid())->table_->Begin(exec_ctx_->GetTransaction());
end_iter_ = exec_ctx_->GetCatalog()->GetTable(plan_->GetTableOid())->table_->End();
}
bool Next(Tuple *tuple) override {
while (iter_ != end_iter_) {
*tuple = *iter_;
if (plan_->GetPredicate()) {
if (plan_->GetPredicate()->Evaluate(tuple, GetOutputSchema()).GetAs<bool>()) {
iter_++;
return true;
}
}
else {
iter_++;
return true;
}
iter_++;
}
return false;
}
Insert
RawInsert对应InsertPlanNode中子节点为空的情况,要插入的tuple存储在InsertPlanNode的raw_values_中,非RawInsert的情况将InsertPlanNode中子节点的结果插入到table中。
void Init() override {
table_meta_ = exec_ctx_->GetCatalog()->GetTable(plan_->TableOid());
table_ = exec_ctx_->GetCatalog()->GetTable(plan_->TableOid())->table_.get();
}
// Note that Insert does not make use of the tuple pointer being passed in.
// We return false if the insert failed for any reason, and return true if all inserts succeeded.
bool Next([[maybe_unused]] Tuple *tuple) override {
RID rid;
if (plan_->IsRawInsert()) {
std::vector<std::vector<Value>> raw_values = plan_->RawValues();
for (unsigned int i = 0; i < raw_values.size(); i++) {
if (table_->InsertTuple(Tuple(raw_values[i], &table_meta_->schema_), &rid, exec_ctx_->GetTransaction()) == false)
return false;
}
}
else {
Tuple child_tuple;
auto child_executor = ExecutorFactory::CreateExecutor(exec_ctx_, plan_->GetChildPlan());
child_executor->Init();
while (child_executor->Next(&child_tuple)) {
if (table_->InsertTuple(child_tuple, &rid, exec_ctx_->GetTransaction()) == false)
return false;
}
}
return true;
}
Hash Join
这里的实现实际上是nested join,在Init()中获取left executor和right executor的全部的tuple,在Next()中通过EvaluateJoin函数来判断两个tuple是否可以join。由于Next()每次只返回一个tuple,所以需要使用idx来记录下次访问的tuple的位置。
void Init() override {
left_executor_->Init();
right_executor_->Init();
Tuple cur_tuple;
left_idx = 0;
right_idx = 0;
while (left_executor_->Next(&cur_tuple)) {
left_tuples.push_back(cur_tuple);
}
while (right_executor_->Next(&cur_tuple)) {
right_tuples.push_back(cur_tuple);
}
left_schema = left_executor_->GetOutputSchema();
right_schema = right_executor_->GetOutputSchema();
output_schema = plan_->OutputSchema();
std::vector<Column> output_cols = output_schema->GetColumns();
std::vector<Column> left_cols = left_schema->GetColumns();
std::vector<Column> right_cols = right_schema->GetColumns();
for (unsigned int i = 0; i < output_cols.size(); i++) {
for (unsigned int j = 0; j < left_cols.size(); j++) {
if (output_cols[i].GetName() == left_cols[j].GetName()) {
output_order.push_back({0, j});
}
}
for (unsigned int j = 0; j < right_cols.size(); j++) {
if (output_cols[i].GetName() == right_cols[j].GetName()) {
output_order.push_back({1, j});
}
}
}
left_tuples_size = left_tuples.size();
right_tuples_size = right_tuples.size();
total_size = left_tuples_size * right_tuples_size;
idx = 0;
}
bool Next(Tuple *tuple) override {
Tuple cur_tuple;
for (; idx < total_size; idx++) {
int left_idx = idx / right_tuples_size;
int right_idx = idx % right_tuples_size;
if (plan_->Predicate()->EvaluateJoin(&left_tuples[left_idx], left_schema, &right_tuples[right_idx], right_schema).GetAs<bool>()) {
std::vector<Value> values;
for (const auto &order : output_order) {
if (order.first == 1)
values.push_back(left_tuples[left_idx].GetValue(left_schema, order.second));
else
values.push_back(right_tuples[right_idx].GetValue(right_schema, order.second));
}
*tuple = Tuple(values, output_schema);
idx++;
return true;
}
}
return false;
}
Aggregation
实验中的SimpleAggregationHashTable已经完成了aggregation的大部分工作,在Init()函数中子节点的结果插入到SimpleAggregationHashTable,在Next()函数中只需要使用SimpleAggregationHashTable的迭代器访问对应的结果,通过plan_->GetHaving()->EvaluateAggregate判断tuple是否符合条件,最后使用output_schema->GetColumn(i).GetExpr()->EvaluateAggregate(aht_iterator_.Key().group_bys_, aht_iterator_.Val().aggregates_)根据output_scheme顺序获取Value,最后根据Value构建Tuple返回。
void Init() override {
child_->Init();
Tuple cur_tuple;
while (child_->Next(&cur_tuple)) {
// if (plan_->GetHaving() && plan_->GetHaving()->Evaluate(&cur_tuple, child_->GetOutputSchema()).GetAs<bool>()) {
aht_.InsertCombine(MakeKey(&cur_tuple), MakeVal(&cur_tuple));
// }
}
aht_iterator_ = aht_.Begin();
}
bool Next(Tuple *tuple) override {
bool ret = false;
std::vector<Value> values;
const Schema *output_schema = GetOutputSchema();
while (aht_iterator_ != aht_.End()) {
if (plan_->GetHaving() == nullptr) {
ret = true;
break;
}
else {
if (plan_->GetHaving()->EvaluateAggregate(aht_iterator_.Key().group_bys_, aht_iterator_.Val().aggregates_).GetAs<bool>()) {
ret = true;
break;
}
}
++aht_iterator_;
}
if (ret) {
for (unsigned int i = 0; i < output_schema->GetColumnCount(); i++) {
values.push_back(output_schema->GetColumn(i).GetExpr()->EvaluateAggregate(aht_iterator_.Key().group_bys_, aht_iterator_.Val().aggregates_));
}
*tuple = Tuple(values, GetOutputSchema());
++aht_iterator_;
}
return ret;
}
本文档详述了cmu15-445项目中的数据库执行器实现,包括创建Catalog表、Sequential Scans、Insert操作、Hash Join及Aggregation。项目目标是实现查询计划的执行,涉及Executor的Init和Next函数,以及Catalog的元信息管理。在ExecutorFactory中,根据计划节点类型创建相应执行器,并在ExecutorContext中管理query相关信息。
2075

被折叠的 条评论
为什么被折叠?



