cmu15-445 Project #3 - Query Execution

本文档详述了cmu15-445项目中的数据库执行器实现,包括创建Catalog表、Sequential Scans、Insert操作、Hash Join及Aggregation。项目目标是实现查询计划的执行,涉及Executor的Init和Next函数,以及Catalog的元信息管理。在ExecutorFactory中,根据计划节点类型创建相应执行器,并在ExecutorContext中管理query相关信息。

这个project的目标是在原来的基础上增加语句执行,实现执行器中的获取查询计划节点并且执行,以及实现执行器中的sequential scans,inserts,hash join, aggregations等功能。执行器使用volcano model,每一个查询计划执行器都实现Next函数,Next会返回一个元组或者返回空。

TASK #1 - CREATING A CATALOG TABLE

数据库需要维护一个内部的catalog去记录数据库的元信息,例如用用于记录数据库中有什么table以及这些table的在哪里。请添加图片描述
在task1中只需实现src/include/catalog/simple_catalog.h中的CreateTable,GetTable(const std::string &table_name), 和GetTable(table_oid_t table_oid)这三个函数,只需要查看SimpleCatalog和TableMetaData这两个类的构造函数然后传入正确的参数即可。在SimpleCatalog中通过names_来记录table name和table_oid的映射,通过tables_来记录table_oid和TableMetadata的映射,在GetTable中如果找不到对应的table则抛出std::out_of_range异常。

TASK #2 - EXECUTORS

在Task 2中将会实现sequential scans、inserts、hash joins和aggregations的executors,对于每一种算子都要实现Init函数和Next函数,其中Init函数用于初始化算法(如获取对应的表),Next每次调用都会返回一个tuple。
算子对应的实现位于以下文件:

src/include/execution/executors/seq_scan_executor.h
src/include/execution/executors/insert_executor.h
src/include/execution/executors/hash_join_executor.h
src/include/execution/executors/aggregation_executor.h

如果想要了解executors是如何在运行时被创建的,参考文件
src/include/execution/executor_factory.h中的ExecutorFactory类。每个executor都有成员ExecutorContext(src/include/execution/executor_context.h),其中维护了query的相关信息。

ExecutorFactory中的成员函数为CreateExecutor,其中CreateExecutor的声明如下:

std::unique_ptr<AbstractExecutor> CreateExecutor(ExecutorContext *exec_ctx, const AbstractPlanNode *plan);

CreateExecutor将计划节点转换为执行器节点,在CreateExecutor函数中的主要内容是根据计划节点中的类型(SeqScan、Insert、HashJoin、Aggregation)创建对应的执行器节点的智能指针。
ExecutorContext中维护了transation、catalog、buffer pool manager:

class ExecutorContext{
Transaction *transaction_;
  SimpleCatalog *catalog_;
  BufferPoolManager *bpm_;
  };
Sequential Scans

Sequential scans遍历整个表并且一次性返回表所有的tuples,sequential scan在SeqScanPlanNode中被指定,计划节点指定要遍历哪个表。在计划节点中可能还包含了predicate,如果tuple不满足predicate,那么跳过该tuple。

void Init() override {
    iter_ = exec_ctx_->GetCatalog()->GetTable(plan_->GetTableOid())->table_->Begin(exec_ctx_->GetTransaction());
    end_iter_ = exec_ctx_->GetCatalog()->GetTable(plan_->GetTableOid())->table_->End();
  }

  bool Next(Tuple *tuple) override { 
    while (iter_ != end_iter_) {
      *tuple = *iter_;
      if (plan_->GetPredicate()) {
        if (plan_->GetPredicate()->Evaluate(tuple, GetOutputSchema()).GetAs<bool>()) {
          iter_++;
          return true;
        }
      }
      else {
        iter_++;
        return true;
      }
      iter_++;
   }
   return false;
  }

Insert

RawInsert对应InsertPlanNode中子节点为空的情况,要插入的tuple存储在InsertPlanNode的raw_values_中,非RawInsert的情况将InsertPlanNode中子节点的结果插入到table中。

void Init() override {
    table_meta_ = exec_ctx_->GetCatalog()->GetTable(plan_->TableOid());
    table_ = exec_ctx_->GetCatalog()->GetTable(plan_->TableOid())->table_.get();
  }

  // Note that Insert does not make use of the tuple pointer being passed in.
  // We return false if the insert failed for any reason, and return true if all inserts succeeded.
  bool Next([[maybe_unused]] Tuple *tuple) override {
    RID rid;
    if (plan_->IsRawInsert()) {
      std::vector<std::vector<Value>> raw_values = plan_->RawValues();
      for (unsigned int i = 0; i < raw_values.size(); i++) {
      if (table_->InsertTuple(Tuple(raw_values[i], &table_meta_->schema_), &rid, exec_ctx_->GetTransaction()) == false) 
        return false;
      }
    }
    else {
      Tuple child_tuple;
      auto child_executor = ExecutorFactory::CreateExecutor(exec_ctx_, plan_->GetChildPlan());
      child_executor->Init();
      while (child_executor->Next(&child_tuple)) {
        if (table_->InsertTuple(child_tuple, &rid, exec_ctx_->GetTransaction()) == false)
          return false;
      }
    }
    
    return true; 
  }
Hash Join

这里的实现实际上是nested join,在Init()中获取left executor和right executor的全部的tuple,在Next()中通过EvaluateJoin函数来判断两个tuple是否可以join。由于Next()每次只返回一个tuple,所以需要使用idx来记录下次访问的tuple的位置。

void Init() override {
    left_executor_->Init();
    right_executor_->Init();
    Tuple cur_tuple;
    left_idx = 0;
    right_idx = 0;

    while (left_executor_->Next(&cur_tuple)) {
      left_tuples.push_back(cur_tuple);
    }

    while (right_executor_->Next(&cur_tuple)) {
      right_tuples.push_back(cur_tuple);
    }

    left_schema = left_executor_->GetOutputSchema();
    right_schema = right_executor_->GetOutputSchema();
    output_schema = plan_->OutputSchema();
    std::vector<Column> output_cols = output_schema->GetColumns();
    std::vector<Column> left_cols = left_schema->GetColumns();
    std::vector<Column> right_cols = right_schema->GetColumns();

    for (unsigned int i = 0; i < output_cols.size(); i++) {
    for (unsigned int j = 0; j < left_cols.size(); j++) {
      if (output_cols[i].GetName() == left_cols[j].GetName()) {
        output_order.push_back({0, j});
      }
    }

    for (unsigned int j = 0; j < right_cols.size(); j++) {
      if (output_cols[i].GetName() == right_cols[j].GetName()) {
        output_order.push_back({1, j});
      }
    }
    }

    left_tuples_size = left_tuples.size();
    right_tuples_size = right_tuples.size();
    total_size = left_tuples_size * right_tuples_size;
    idx = 0;
  }

  bool Next(Tuple *tuple) override { 
    Tuple cur_tuple;

    for (; idx < total_size; idx++) {
        int left_idx = idx / right_tuples_size;
        int right_idx = idx % right_tuples_size;
        if (plan_->Predicate()->EvaluateJoin(&left_tuples[left_idx], left_schema, &right_tuples[right_idx], right_schema).GetAs<bool>()) {
          std::vector<Value> values;

          for (const auto &order : output_order) {
            if (order.first == 1)
              values.push_back(left_tuples[left_idx].GetValue(left_schema, order.second));
            else 
              values.push_back(right_tuples[right_idx].GetValue(right_schema, order.second));
          }
          *tuple = Tuple(values, output_schema);
          idx++;
          
          return true;
      }
    }
    return false;
  }
Aggregation

实验中的SimpleAggregationHashTable已经完成了aggregation的大部分工作,在Init()函数中子节点的结果插入到SimpleAggregationHashTable,在Next()函数中只需要使用SimpleAggregationHashTable的迭代器访问对应的结果,通过plan_->GetHaving()->EvaluateAggregate判断tuple是否符合条件,最后使用output_schema->GetColumn(i).GetExpr()->EvaluateAggregate(aht_iterator_.Key().group_bys_, aht_iterator_.Val().aggregates_)根据output_scheme顺序获取Value,最后根据Value构建Tuple返回。

void Init() override {
    child_->Init();

    Tuple cur_tuple;
    while (child_->Next(&cur_tuple)) {
      // if (plan_->GetHaving() && plan_->GetHaving()->Evaluate(&cur_tuple, child_->GetOutputSchema()).GetAs<bool>()) {
        aht_.InsertCombine(MakeKey(&cur_tuple), MakeVal(&cur_tuple));
      // }
    }

    aht_iterator_ = aht_.Begin();
  }

  bool Next(Tuple *tuple) override {
    bool ret = false;
    std::vector<Value> values;
    const Schema *output_schema = GetOutputSchema();

    while (aht_iterator_ != aht_.End()) {
      if (plan_->GetHaving() == nullptr) {
        ret = true;
        break;
      }
      else {
        if (plan_->GetHaving()->EvaluateAggregate(aht_iterator_.Key().group_bys_, aht_iterator_.Val().aggregates_).GetAs<bool>()) {
          ret = true;
          break;
        }
      }
      
      ++aht_iterator_;
    }

    if (ret) {
      for (unsigned int i = 0; i < output_schema->GetColumnCount(); i++) {
        values.push_back(output_schema->GetColumn(i).GetExpr()->EvaluateAggregate(aht_iterator_.Key().group_bys_, aht_iterator_.Val().aggregates_));
      }
      *tuple = Tuple(values, GetOutputSchema());
      ++aht_iterator_;
    }
    
    return ret;
  }
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值