MIT 6.830数据库系统 -- lab two


项目拉取

原项目使用ant进行项目构建,我已经更改为Maven构建,大家直接拉取我改好后的项目即可:

然后就是正常的maven项目配置,启动即可。各个lab的实现,会放在lab/分支下。


Lab Two

lab2必须在lab1提交的代码基础上进行开发,否则无法完成相应的练习。此外,实验还提供了源码中不存在的额外测试文件。

实现提示

开始编写代码之前,强烈建议通读整篇文档,以对SimpleDB的设计有个整体的认识,对于我们编写代码非常有帮助

建议跟着文档的练习来实现对应的代码,每个练习都标明了要实现哪个类以及通过哪些单元测试,跟着练习走即可。

下面是本实验的大致流程:

  • 实现Filter和Join操作并且通过相关的单元测试验证你的实现,阅读类的Javadoc将会帮助我们实现。项目中已经提供了Project和OrderBy操作的实现,阅读其代码能够帮助我们理解其他操作是如何实现的
  • 实现IntegerAggregator和StringAggregator,你将会编写对元组的某一特定列分组进行聚合操作;其中integer支持求和、求最大最小值、求数量、求平均值,string只支持count聚合操作
  • 实现Aggregate操作;同其他操作一样,聚合操作也实现类OpIterator接口。注意每次调用next()的Aggregate操作的输出是整个分组的聚合值,Aggregate构造函数将会设置聚合和分组操作对应的列
  • 实现BufferPool类中的插入、删除和页面丢弃策略,暂时不需要关心事务
  • 实现Insert和Delete操作;与所有的操作相似,Insert和Delete实现OpIterator接口,接收用于插入或者删除的元组并输出该操作影响的元组个数;这些操作将会调用BufferPool中合适的方法用于修改磁盘上的页

注意SimpleDB没有实现一致性和完整性检查,所以它可能会插入重复的记录,并且没有方法保证主键或外键的一致性。

在本节实现的基础上,我们需要使用项目提供的SQL解析器去运行SQL语句查询。

最后,你可能会发现本实验的操作扩展Operator类而不是实现OpIterator接口。因为next/hasNext的实现总是重复的、烦人的,Operator实现了通用的逻辑操作,并且仅需要实现readNext方法。可以随意使用这种风格,或者使用OpIterator。如果要实现OpIterator接口,请移除extends Operator,并替换为implements OpIterator。


练习一 – Filter and Join

Filter and Join:

本节将会实现比扫描整张表更有趣的操作:

  • Filter:该操作仅返回满足(构造时指定的)Predicate操作的元组;因此,它会过滤那些不符合操作的元组
  • Join:该操作将会通过(构造时指定的)JoinPredicate联合两个表的元组,Join操作仅需实现一个简单的嵌套循环连接

实现如下类中的方法:

  • src/java/simpledb/execution/Predicate.java
  • src/java/simpledb/execution/JoinPredicate.java
  • src/java/simpledb/execution/Filter.java
  • src/java/simpledb/execution/Join.java

Predict和JoinPredict分别负责普通的断言和Join断言的操作:

在这里插入图片描述
Predict类核心源码如下:

/**
 * Predicate compares tuples to a specified Field value.
 * 比较元组某个特定的字段--> select * from t where t.a=1;
 */
public class Predicate implements Serializable {
    private static final long serialVersionUID = 1L;
    /**
     * 待比较字段
     */
    private final int field;
    /**
     * 操作码
     */
    private final Op op;
    /**
     * 操作数
     */
    private final Field operand;
<span class="token comment">/**
 * Constants used for return codes in Field.compare
 */</span>
<span class="token keyword">public</span> <span class="token keyword">enum</span> <span class="token class-name">Op</span> <span class="token keyword">implements</span> <span class="token class-name">Serializable</span> <span class="token punctuation">{<!-- --></span>
    <span class="token constant">EQUALS</span><span class="token punctuation">,</span> <span class="token constant">GREATER_THAN</span><span class="token punctuation">,</span> <span class="token constant">LESS_THAN</span><span class="token punctuation">,</span> <span class="token constant">LESS_THAN_OR_EQ</span><span class="token punctuation">,</span> <span class="token constant">GREATER_THAN_OR_EQ</span><span class="token punctuation">,</span> <span class="token constant">LIKE</span><span class="token punctuation">,</span> <span class="token constant">NOT_EQUALS</span><span class="token punctuation">;</span>

    <span class="token comment">/**
     * Interface to access operations by integer value for command-line
     * convenience.
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token class-name">Op</span> <span class="token function">getOp</span><span class="token punctuation">(</span><span class="token keyword">int</span> i<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token function">values</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token keyword">public</span> <span class="token class-name">String</span> <span class="token function">toString</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">EQUALS</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;&#61;&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">GREATER_THAN</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;&gt;&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">LESS_THAN</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;&lt;&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">LESS_THAN_OR_EQ</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;&lt;&#61;&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">GREATER_THAN_OR_EQ</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;&gt;&#61;&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">LIKE</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;LIKE&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span> <span class="token operator">&#61;&#61;</span> <span class="token constant">NOT_EQUALS</span><span class="token punctuation">)</span>
            <span class="token keyword">return</span> <span class="token string">&#34;&lt;&gt;&#34;</span><span class="token punctuation">;</span>
        <span class="token keyword">throw</span> <span class="token keyword">new</span> <span class="token class-name">IllegalStateException</span><span class="token punctuation">(</span><span class="token string">&#34;impossible to reach here&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token class-name">Predicate</span><span class="token punctuation">(</span><span class="token keyword">int</span> field<span class="token punctuation">,</span> <span class="token class-name">Op</span> op<span class="token punctuation">,</span> <span class="token class-name">Field</span> operand<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>field <span class="token operator">&#61;</span> field<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>op <span class="token operator">&#61;</span> op<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>operand <span class="token operator">&#61;</span> operand<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Compares the field number of t specified in the constructor to the
 * operand field specified in the constructor using the operator specific in
 * the constructor. The comparison can be made through Field&#39;s compare
 * method.
 */</span>
<span class="token keyword">public</span> <span class="token keyword">boolean</span> <span class="token function">filter</span><span class="token punctuation">(</span><span class="token class-name">Tuple</span> t<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>t <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token class-name">Field</span> f <span class="token operator">&#61;</span> t<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>field<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> f<span class="token punctuation">.</span><span class="token function">compare</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>op<span class="token punctuation">,</span> <span class="token keyword">this</span><span class="token punctuation">.</span>operand<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">.</span> <span class="token punctuation">.</span> <span class="token punctuation">.</span>

}

JoinPredict类核心源码如下:

/**
 * JoinPredicate compares fields of two tuples using a predicate. JoinPredicate
 * is most likely used by the Join operator.
 * 用于JOIN连接断言的两个元组的某个字段 --> select * from t1 join t2 on t1.id=t2.id;
 */
public class JoinPredicate implements Serializable {
    private static final long serialVersionUID = 1L;
    /**
     * 字段1
     */
    private final int field1;
    /**
     * 操作码
     */
    private final Predicate.Op op;
    /**
     * 字段2
     */
    private final int field2;
<span class="token keyword">public</span> <span class="token class-name">JoinPredicate</span><span class="token punctuation">(</span><span class="token keyword">int</span> field1<span class="token punctuation">,</span> <span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span> op<span class="token punctuation">,</span> <span class="token keyword">int</span> field2<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>field1 <span class="token operator">&#61;</span> field1<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>op <span class="token operator">&#61;</span> op<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>field2 <span class="token operator">&#61;</span> field2<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Apply the predicate to the two specified tuples. The comparison can be
 * made through Field&#39;s compare method.
 */</span>
<span class="token keyword">public</span> <span class="token keyword">boolean</span> <span class="token function">filter</span><span class="token punctuation">(</span><span class="token class-name">Tuple</span> t1<span class="token punctuation">,</span> <span class="token class-name">Tuple</span> t2<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>t1 <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>t2 <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token class-name">Field</span> first <span class="token operator">&#61;</span> t1<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span>field1<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">Field</span> second <span class="token operator">&#61;</span> t2<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span>field2<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> first<span class="token punctuation">.</span><span class="token function">compare</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>op<span class="token punctuation">,</span> second<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

}


OpIterator意为可操作迭代器,在SimpleDB中的含义为: 迭代器遍历元素的时候可以同时进行一些操作,具体遍历时执行什么操作由子类决定。
在这里插入图片描述
操作迭代器意味着迭代器自身在遍历数据时,会根据自身实现搞点事情,Operator接口模板化了部分流程,各个需要在迭代器遍历时进行操作的子类,只需要去实现readNext这个核心方法,并且每次获取下一个元组的时候,搞点事情即可。

这里不是说子类只需要去实现readNext方法,而是说readNext是子类需要实现的核心方法,其他均为辅助方法。

Operator类的核心源码如下:

/**
 * Abstract class for implementing operators. It handles close, next and hasNext. Subclasses only need to implement open and readNext.
 */
public abstract class Operator implements OpIterator {
<span class="token keyword">public</span> <span class="token keyword">boolean</span> <span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span><span class="token keyword">this</span><span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">)</span>
        <span class="token keyword">throw</span> <span class="token keyword">new</span> <span class="token class-name">IllegalStateException</span><span class="token punctuation">(</span><span class="token string">&#34;Operator not yet open&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>next <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span>
        next <span class="token operator">&#61;</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> next <span class="token operator">!&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token class-name">Tuple</span> <span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span>
        <span class="token class-name">NoSuchElementException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>next <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        next <span class="token operator">&#61;</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span>next <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span>
            <span class="token keyword">throw</span> <span class="token keyword">new</span> <span class="token class-name">NoSuchElementException</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token class-name">Tuple</span> result <span class="token operator">&#61;</span> next<span class="token punctuation">;</span>
    next <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> result<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">protected</span> <span class="token keyword">abstract</span> <span class="token class-name">Tuple</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span>
        <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">;</span>

<span class="token comment">/**
 * Closes this iterator. If overridden by a subclass, they should call
 * super.close() in order for Operator&#39;s internal state to be consistent.
 */</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// Ensures that a future call to next() will fail</span>
    next <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token keyword">open</span> <span class="token operator">&#61;</span> <span class="token boolean">false</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">private</span> <span class="token class-name">Tuple</span> next <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token keyword">private</span> <span class="token keyword">boolean</span> <span class="token keyword">open</span> <span class="token operator">&#61;</span> <span class="token boolean">false</span><span class="token punctuation">;</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token keyword">open</span> <span class="token operator">&#61;</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * &#64;return return the children DbIterators of this operator. If there is
 *         only one child, return an array of only one element. For join
 *         operators, the order of the children is not important. But they
 *         should be consistent among multiple calls.
 * */</span>
<span class="token keyword">public</span> <span class="token keyword">abstract</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token function">getChildren</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token comment">/**
 * Set the children(child) of this operator. If the operator has only one
 * child, children[0] should be used. If the operator is a join, children[0]
 * and children[1] should be used.
 * */</span>
<span class="token keyword">public</span> <span class="token keyword">abstract</span> <span class="token keyword">void</span> <span class="token function">setChildren</span><span class="token punctuation">(</span><span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> children<span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token comment">/**
 * &#64;return return the TupleDesc of the output tuples of this operator
 * */</span>
<span class="token keyword">public</span> <span class="token keyword">abstract</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

}

  • 迭代器调用约定: 先调用hasNext判断是否还有下一个元素,如果有调用next获取下一个元素,并且调用hashNext前需要先调用Open。
  • OpIterator操作迭代器分为两部分,一部分是原始职能: 提供数据进行迭代遍历; 另一部分是附加职能: 在原始迭代器遍历过程中进行操作。
  • Operator采用装饰器模式封装原始迭代器遍历行为,并在其基础上增加了遍历时进行操作的行为。
  • 装饰器模式需要有被装饰的对象,这里通过setChildren进行设置,但是这里与普通的装饰器模式不同,因为不同的操作会涉及到不同的个数的被装饰对象。
    • select * from t where t.age=18 --> 此处实际使用了单值比较过滤操作,所以只涉及单表的迭代器
    • select * from t1 join t2 on t1.age=t2.age --> 此时实际使用了两表JOIN操作,所以涉及两个表的迭代器

SimpleDB整个迭代器的设计思路采用了装饰器模式实现,具体如下图所示:
在这里插入图片描述

  • Operator的实现类都是装饰器,而SeqScan是迭代器的实现,也就是被装饰的对象

迭代器模式的用法可以参考Java IO整体架构实现。

Filter用于单值比较操作,具体流程如下:
在这里插入图片描述

图中只展示的一层装饰,如果存在多层装饰,那么child仍然是个装饰器,可以利用多层装饰实现如: select * from t where t.age=18 and t.name="dhy"的匹配过滤。

Filter核心源码如下:

/**
 * Filter is an operator that implements a relational select.
 */
public class Filter extends Operator {
    private final Predicate predicate;
    private OpIterator child;
<span class="token comment">/**
 * Constructor accepts a predicate to apply and a child operator to read
 * tuples to filter from.
 */</span>
<span class="token keyword">public</span> <span class="token class-name">Filter</span><span class="token punctuation">(</span><span class="token class-name">Predicate</span> p<span class="token punctuation">,</span> <span class="token class-name">OpIterator</span> child<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>predicate <span class="token operator">&#61;</span> p<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child <span class="token operator">&#61;</span> child<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">NoSuchElementException</span><span class="token punctuation">,</span>
        <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child<span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child<span class="token punctuation">.</span><span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * AbstractDbIterator.readNext implementation. Iterates over tuples from the
 * child operator, applying the predicate to them and returning those that
 * pass the predicate (i.e. for which the Predicate.filter() returns true.)
 * 
 */</span>
<span class="token keyword">protected</span> <span class="token class-name">Tuple</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">NoSuchElementException</span><span class="token punctuation">,</span>
        <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">while</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>child<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> <span class="token keyword">this</span><span class="token punctuation">.</span>child<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>predicate<span class="token punctuation">.</span><span class="token function">filter</span><span class="token punctuation">(</span>tuple<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token keyword">return</span> tuple<span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">return</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token function">getChildren</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">new</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token punctuation">{<!-- --></span><span class="token keyword">this</span><span class="token punctuation">.</span>child<span class="token punctuation">}</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">setChildren</span><span class="token punctuation">(</span><span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child <span class="token operator">&#61;</span> children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

}


Join用于连接条件判断,流程如下:
在这里插入图片描述
Join的核心源码如下:

/**
 * The Join operator implements the relational join operation.
 */
public class Join extends Operator {
    private static final long serialVersionUID = 1L;
    /**
     * 连接条件
     */
    private final JoinPredicate predicate;
    /**
     * 参与连接的表
     */
    private OpIterator[] children;
<span class="token keyword">private</span> <span class="token class-name">Tuple</span> tuple1<span class="token punctuation">;</span>

<span class="token comment">/**
 * Constructor. Accepts two children to join and the predicate to join them
 * on
 * &#64;param p      The predicate to use to join the children
 * &#64;param child1 Iterator for the left(outer) relation to join
 * &#64;param child2 Iterator for the right(inner) relation to join
 */</span>
<span class="token keyword">public</span> <span class="token class-name">Join</span><span class="token punctuation">(</span><span class="token class-name">JoinPredicate</span> p<span class="token punctuation">,</span> <span class="token class-name">OpIterator</span> child1<span class="token punctuation">,</span> <span class="token class-name">OpIterator</span> child2<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>predicate <span class="token operator">&#61;</span> p<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>children <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span>child1<span class="token punctuation">,</span> child2<span class="token punctuation">}</span><span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tuple1 <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
  * 返回的是两表连接后得到结果的行schema
 */</span>
<span class="token keyword">public</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">.</span><span class="token function">merge</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">NoSuchElementException</span><span class="token punctuation">,</span>
        <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token class-name">OpIterator</span> child <span class="token operator">:</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        child<span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token class-name">OpIterator</span> child <span class="token operator">:</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        child<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token class-name">OpIterator</span> child <span class="token operator">:</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        child<span class="token punctuation">.</span><span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Returns the next tuple generated by the join, or null if there are no
 * more tuples. Logically, this is the next tuple in r1 cross r2 that
 * satisfies the join predicate. There are many possible implementations;
 * the simplest is a nested loops join.
 * &lt;p&gt;
 * Note that the tuples returned from this particular implementation of Join
 * are simply the concatenation of joining tuples from the left and right
 * relation. Therefore, if an equality predicate is used there will be two
 * copies of the join attribute in the results. (Removing such duplicate
 * columns can be done with an additional projection operator if needed.)
 * &lt;p&gt;
 * For example, if one tuple is {1,2,3} and the other tuple is {1,5,6},
 * joined on equality of the first column, then this returns {1,2,3,1,5,6}.
 *
 * &#64;return The next matching tuple.
 * &#64;see JoinPredicate#filter
 */</span>
<span class="token keyword">protected</span> <span class="token class-name">Tuple</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// 双重循环,将children[0]作为驱动表,children[1]作为被驱动表</span>
    <span class="token keyword">while</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">||</span> tuple1 <span class="token operator">!&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token comment">// 获取驱动表的一行记录</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&amp;&amp;</span> tuple1 <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            tuple1 <span class="token operator">&#61;</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token comment">// 获取被驱动表的一行记录</span>
        <span class="token keyword">while</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token class-name">Tuple</span> tuple2 <span class="token operator">&#61;</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token comment">// JoinPredicate判断join条件是否成立</span>
            <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>predicate<span class="token punctuation">.</span><span class="token function">filter</span><span class="token punctuation">(</span>tuple1<span class="token punctuation">,</span> tuple2<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 获取驱动表schema和被驱动表schema合并后的schema</span>
                <span class="token class-name">TupleDesc</span> tupleDesc <span class="token operator">&#61;</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token comment">// 用于承载合并后的行记录</span>
                <span class="token class-name">Tuple</span> res <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span>tupleDesc<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">int</span> i <span class="token operator">&#61;</span> <span class="token number">0</span><span class="token punctuation">;</span>
                <span class="token comment">// 拿到驱动表当前行的所有字段,然后设置到res</span>
                <span class="token class-name">Iterator</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Field</span><span class="token punctuation">&gt;</span></span> fields1 <span class="token operator">&#61;</span> tuple1<span class="token punctuation">.</span><span class="token function">fields</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">while</span> <span class="token punctuation">(</span>fields1<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&amp;&amp;</span> i <span class="token operator">&lt;</span> tupleDesc<span class="token punctuation">.</span><span class="token function">numFields</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    res<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span>i<span class="token operator">&#43;&#43;</span><span class="token punctuation">,</span> fields1<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token comment">// 拿到被驱动表当前行的所有字段,然后设置到res</span>
                <span class="token class-name">Iterator</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Field</span><span class="token punctuation">&gt;</span></span> fields2 <span class="token operator">&#61;</span> tuple2<span class="token punctuation">.</span><span class="token function">fields</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">while</span> <span class="token punctuation">(</span>fields2<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&amp;&amp;</span> i <span class="token operator">&lt;</span> tupleDesc<span class="token punctuation">.</span><span class="token function">numFields</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    res<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span>i<span class="token operator">&#43;&#43;</span><span class="token punctuation">,</span> fields2<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token comment">// 被驱动表遍历完了,重置指针,同时将tuple1也重置</span>
                <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span><span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                    tuple1 <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token comment">// 返回捞到的记录</span>
                <span class="token keyword">return</span> res<span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
        <span class="token punctuation">}</span>
        <span class="token comment">// 驱动表当前行在被驱动表中没有匹配行,那么将被驱动表迭代指针复原</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        tuple1 <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token comment">// 没有匹配记录</span>
    <span class="token keyword">return</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token function">getChildren</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">setChildren</span><span class="token punctuation">(</span><span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>children<span class="token operator">&#61;</span>children<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

}

关于tuple1属性作用说明:

  • 我们从驱动表中获取一条记录后,需要遍历被驱动表,在被驱动表中找出所有符合连接条件的行,然后拼接两表字段,然后返回结果
  • fetchNext方法每调用一次,都会返回符合条件的一行记录,因此我们需要保留驱动表当前正在匹配的行,等到某一次fetchNext方法调用时,发现当前行与被驱动表每一行都进行了一次匹配后,才会从驱动表中取出下一行进行匹配。
    在这里插入图片描述

练习二 – Aggregates

Aggregates:

本节我们应该实现如下五种聚合操作:count、sum、avg、min、max,并且支持分组聚合操作。仅支持对一个域进行聚合,对一个域进行分组即可。

为了实现聚合操作,我们使用Aggregator接口将新的元组合并到现有的聚合操作结果中。实际进行哪种聚合操作会在构造Aggregate时指明。所以,客户端代码需要为子操作的每个元组调用Aggregator.mergeTupleIntoGroup()方法,当所有的元组都被合并完成以后,客户端将会获得聚合操作的结果。

  • 如果指定分组的话,那么返回结果格式为: (groupValue, aggregateValue);
  • 没有指定分组的话,返回格式为:(aggregateValue)

本节实验中,我们不需要担心分组的数量超过可用内存的限制。

实现如下类中的方法:

  • src/java/simpledb/execution/IntegerAggregator.java
  • src/java/simpledb/execution/StringAggregator.java
  • src/java/simpledb/execution/Aggregate.java

Aggregator聚合器干的事情就是接收传入的Tuple,然后内部进行计算,当我们传入n个tuple后,我们可以调用聚合器的迭代器,获取当前聚合的结果:
在这里插入图片描述

上面给出的是不涉及分组的聚合操作,如果涉及分组的话,聚合过程如下图所示:
在这里插入图片描述
Aggregator聚合器接口定义如下:

/**
 * The common interface for any class that can compute an aggregate over a
 * list of Tuples.
 */
public interface Aggregator extends Serializable {
    int NO_GROUPING = -1;
<span class="token comment">/**
 * SUM_COUNT and SC_AVG will
 * only be used in lab7, you are not required
 * to implement them until then.
 * */</span>
<span class="token keyword">enum</span> <span class="token class-name">Op</span> <span class="token keyword">implements</span> <span class="token class-name">Serializable</span> <span class="token punctuation">{<!-- --></span>
    <span class="token constant">MIN</span><span class="token punctuation">,</span> <span class="token constant">MAX</span><span class="token punctuation">,</span> <span class="token constant">SUM</span><span class="token punctuation">,</span> <span class="token constant">AVG</span><span class="token punctuation">,</span> <span class="token constant">COUNT</span><span class="token punctuation">,</span>
    <span class="token comment">/**
     * SUM_COUNT: compute sum and count simultaneously, will be
     * needed to compute distributed avg in lab7.
     * */</span>
    <span class="token constant">SUM_COUNT</span><span class="token punctuation">,</span>
    <span class="token comment">/**
     * SC_AVG: compute the avg of a set of SUM_COUNT tuples,
     * will be used to compute distributed avg in lab7.
     * */</span>
    <span class="token constant">SC_AVG</span><span class="token punctuation">;</span>
    <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Merge a new tuple into the aggregate for a distinct group value;
 * creates a new group aggregate result if the group value has not yet
 * been encountered.
 *
 * &#64;param tup the Tuple containing an aggregate field and a group-by field
 */</span>
<span class="token keyword">void</span> <span class="token function">mergeTupleIntoGroup</span><span class="token punctuation">(</span><span class="token class-name">Tuple</span> tup<span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token comment">/**
 * Create a OpIterator over group aggregate results.
 * &#64;see TupleIterator for a possible helper
 */</span>
<span class="token class-name">OpIterator</span> <span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

}

对于不同类型字段的聚合有对应限制,比如: 字符串只支持COUNT统计个数聚合,不支持例如SUM,AVG等聚合操作。因此针对不兼容的类型,我们需要给出不同的聚合器实现:

  • 首先来看比较简单的StringAggregator字符串聚合器,其只支持对COUNT聚合的操作
/**
 * Knows how to compute some aggregate over a set of StringFields.
 */
public class StringAggregator implements Aggregator {
    private static final IntField NO_GROUP = new IntField(-1);
    /**
     * 用于分组
     */
    private int gbfield;
    private Type gbfieldtype;
    /**
     * 用于聚合
     */
    private int afield;
    private Op what;
    /**
     * 存放结果-- 分组聚合返回的是多组键值对,分别代表分组字段不同值对应的聚合结果
     * 非分组聚合只会返回一个聚合结果,这里为了统一化处理,采用NO_GROUP做标记,进行区分
     */
    private Map<Field, Tuple> tupleMap;
    private TupleDesc desc;
<span class="token comment">/**
 * Aggregate constructor
 *
 * &#64;param gbfield     the 0-based index of the group-by field in the tuple, or NO_GROUPING if there is no grouping
 * &#64;param gbfieldtype the type of the group by field (e.g., Type.INT_TYPE), or null if there is no grouping
 * &#64;param afield      the 0-based index of the aggregate field in the tuple
 * &#64;param what        aggregation operator to use -- only supports COUNT
 * &#64;throws IllegalArgumentException if what !&#61; COUNT
 */</span>

<span class="token keyword">public</span> <span class="token class-name">StringAggregator</span><span class="token punctuation">(</span><span class="token keyword">int</span> gbfield<span class="token punctuation">,</span> <span class="token class-name">Type</span> gbfieldtype<span class="token punctuation">,</span> <span class="token keyword">int</span> afield<span class="token punctuation">,</span> <span class="token class-name">Op</span> what<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">//字符串只支持COUNT聚合操作</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">COUNT</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">throw</span> <span class="token keyword">new</span> <span class="token class-name">IllegalArgumentException</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>gbfield <span class="token operator">&#61;</span> gbfield<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>gbfieldtype <span class="token operator">&#61;</span> gbfieldtype<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>afield <span class="token operator">&#61;</span> afield<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>what <span class="token operator">&#61;</span> what<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tupleMap <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ConcurrentHashMap</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token comment">//非分组聚合返回的结果采用占位符进行统一适配</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>gbfield <span class="token operator">&#61;&#61;</span> <span class="token constant">NO_GROUPING</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>desc <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;aggregateValue&#34;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span>desc<span class="token punctuation">)</span><span class="token punctuation">;</span>
        tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>tupleMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">//分组聚合返回结果Schema由两个字段组成: 分组字段和聚合结果</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>desc <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span>gbfieldtype<span class="token punctuation">,</span> <span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;groupValue&#34;</span><span class="token punctuation">,</span> <span class="token string">&#34;aggregateValue&#34;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Merge a new tuple into the aggregate, grouping as indicated in the constructor
 *
 * &#64;param tup the Tuple containing an aggregate field and a group-by field
 */</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">mergeTupleIntoGroup</span><span class="token punctuation">(</span><span class="token class-name">Tuple</span> tup<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">//只支持COUNT聚合</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>gbfield <span class="token operator">&#61;&#61;</span> <span class="token constant">NO_GROUPING</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> tupleMap<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">IntField</span> field <span class="token operator">&#61;</span> <span class="token punctuation">(</span><span class="token class-name">IntField</span><span class="token punctuation">)</span> tuple<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>field<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        tupleMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
        <span class="token class-name">Field</span> field <span class="token operator">&#61;</span> tup<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span>gbfield<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>tupleMap<span class="token punctuation">.</span><span class="token function">containsKey</span><span class="token punctuation">(</span>field<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>desc<span class="token punctuation">)</span><span class="token punctuation">;</span>
            tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> field<span class="token punctuation">)</span><span class="token punctuation">;</span>
            tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            tupleMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>field<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
            <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> tupleMap<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>field<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token class-name">IntField</span> intField <span class="token operator">&#61;</span> <span class="token punctuation">(</span><span class="token class-name">IntField</span><span class="token punctuation">)</span> tuple<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>intField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            tupleMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>field<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Create a OpIterator over group aggregate results.
 */</span>
<span class="token keyword">public</span> <span class="token class-name">OpIterator</span> <span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">new</span> <span class="token class-name">StringIterator</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">StringIterator</span> <span class="token keyword">implements</span> <span class="token class-name">OpIterator</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">private</span> <span class="token class-name">StringAggregator</span> aggregator<span class="token punctuation">;</span>
    <span class="token keyword">private</span> <span class="token class-name">Iterator</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Tuple</span><span class="token punctuation">&gt;</span></span> iterator<span class="token punctuation">;</span>

    <span class="token keyword">public</span> <span class="token class-name">StringIterator</span><span class="token punctuation">(</span><span class="token class-name">StringAggregator</span> aggregator<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>aggregator <span class="token operator">&#61;</span> aggregator<span class="token punctuation">;</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>iterator <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>iterator <span class="token operator">&#61;</span> aggregator<span class="token punctuation">.</span>tupleMap<span class="token punctuation">.</span><span class="token function">values</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">boolean</span> <span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> iterator<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token class-name">Tuple</span> <span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">NoSuchElementException</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> iterator<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
        iterator <span class="token operator">&#61;</span> aggregator<span class="token punctuation">.</span>tupleMap<span class="token punctuation">.</span><span class="token function">values</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> aggregator<span class="token punctuation">.</span>desc<span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        iterator <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

}


  • 其次来看稍微比较复杂的IntegerAggregator整数聚合器,其支持Op枚举中所有聚合操作
/**
 * Knows how to compute some aggregate over a set of IntFields.
 * <p/>
 * 针对int字段进行聚合操作,聚合得到的结果需要是个整数
 */
public class IntegerAggregator implements Aggregator {
    private static final long serialVersionUID = 1L;
    private static final Field NO_GROUP = new IntField(-1);
    /**
     * 用于分组
     */
    private int gbfield;
    private Type gbfieldType;
    /**
     * 用于聚合
     */
    private int afield;
    private Op what;
    /**
     * 存放结果
     */
    private TupleDesc tupleDesc;
    private Map<Field, Tuple> aggregate;
    /**
     * 用于非分组情况下的聚合操作
     */
    private int counts;
    private int summary;
    /**
     * 用于分组情况下的聚合操作
     */
    private Map<Field, Integer> countsMap;
    private Map<Field, Integer> sumMap;
<span class="token comment">/**
 * Aggregate constructor
 *
 * &#64;param gbfield     the 0-based index of the group-by field in the tuple, or
 *                    NO_GROUPING if there is no grouping
 * &#64;param gbfieldtype the type of the group by field (e.g., Type.INT_TYPE), or null
 *                    if there is no grouping
 * &#64;param afield      the 0-based index of the aggregate field in the tuple
 * &#64;param what        the aggregation operator
 */</span>

<span class="token keyword">public</span> <span class="token class-name">IntegerAggregator</span><span class="token punctuation">(</span><span class="token keyword">int</span> gbfield<span class="token punctuation">,</span> <span class="token class-name">Type</span> gbfieldtype<span class="token punctuation">,</span> <span class="token keyword">int</span> afield<span class="token punctuation">,</span> <span class="token class-name">Op</span> what<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">//分组字段</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>gbfield <span class="token operator">&#61;</span> gbfield<span class="token punctuation">;</span>
    <span class="token comment">//分组字段类型</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>gbfieldType <span class="token operator">&#61;</span> gbfieldtype<span class="token punctuation">;</span>
    <span class="token comment">//聚合得到的结果,在聚合返回结果行中的字段下标</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>afield <span class="token operator">&#61;</span> afield<span class="token punctuation">;</span>
    <span class="token comment">//进行什么样的聚合操作</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>what <span class="token operator">&#61;</span> what<span class="token punctuation">;</span>
    <span class="token comment">//存放聚合结果</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>aggregate <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ConcurrentHashMap</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token comment">// 非分组聚合</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>gbfield <span class="token operator">&#61;&#61;</span> <span class="token constant">NO_GROUPING</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;aggregateValue&#34;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span>tupleDesc<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">//占位符</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
        <span class="token comment">// 分组聚合,那么返回的聚合结果行由分组字段和该分组字段的聚合结果值组成</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span>gbfieldtype<span class="token punctuation">,</span> <span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;groupValue&#34;</span><span class="token punctuation">,</span> <span class="token string">&#34;aggregateValue&#34;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token comment">// 如果聚合操作是AVG,那么需要初始化count和summary变量,用于存放AVG聚合中间计算状态</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>gbfield <span class="token operator">&#61;&#61;</span> <span class="token constant">NO_GROUPING</span> <span class="token operator">&amp;&amp;</span> what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">AVG</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>counts <span class="token operator">&#61;</span> <span class="token number">0</span><span class="token punctuation">;</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>summary <span class="token operator">&#61;</span> <span class="token number">0</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>gbfield <span class="token operator">!&#61;</span> <span class="token constant">NO_GROUPING</span> <span class="token operator">&amp;&amp;</span> what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">AVG</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>countsMap <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ConcurrentHashMap</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>sumMap <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ConcurrentHashMap</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Merge a new tuple into the aggregate, grouping as indicated in the
 * constructor
 * &lt;p&gt;
 * 向整数聚合器中添加一行记录,进行分组计算
 * &#64;param tup the Tuple containing an aggregate field and a group-by field
 */</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">mergeTupleIntoGroup</span><span class="token punctuation">(</span><span class="token class-name">Tuple</span> tup<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// 从传递给聚合器的行记录中取出聚合字段的值</span>
    <span class="token class-name">IntField</span> operationField <span class="token operator">&#61;</span> <span class="token punctuation">(</span><span class="token class-name">IntField</span><span class="token punctuation">)</span> tup<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span>afield<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>operationField <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token comment">// 非分组聚合:</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>gbfield <span class="token operator">&#61;&#61;</span> <span class="token constant">NO_GROUPING</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token comment">// 拿到承载聚合结果的元组对象</span>
        <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> aggregate<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">IntField</span> field <span class="token operator">&#61;</span> <span class="token punctuation">(</span><span class="token class-name">IntField</span><span class="token punctuation">)</span> tuple<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 说明是进行聚合的第一行记录</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span>field <span class="token operator">&#61;&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token comment">// 如果聚合是统计个数操作</span>
            <span class="token keyword">if</span> <span class="token punctuation">(</span>what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">COUNT</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 初值为1</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">AVG</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 如果聚合是求平均值操作</span>
                <span class="token comment">// 统计参与聚合的记录个数</span>
                counts<span class="token operator">&#43;&#43;</span><span class="token punctuation">;</span>
                <span class="token comment">// 累加每个值</span>
                summary <span class="token operator">&#61;</span> operationField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token comment">// 如果参与聚合的行只存在一个,那么平均值就是当前行的值</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 其他的情况: MIN,MAX,SUM在参与聚合的行只存在一个时,聚合结果就是当前行的值</span>
                <span class="token comment">// 所以这里可以统一处理</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            <span class="token keyword">return</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token comment">// 判断是哪种类型的聚合</span>
        <span class="token comment">// 非第一行记录</span>
        <span class="token keyword">switch</span> <span class="token punctuation">(</span>what<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token comment">//select MIN(age) from t;</span>
            <span class="token keyword">case</span> <span class="token constant">MIN</span><span class="token operator">:</span>
                <span class="token comment">// 聚合字段的值和当前阶段已经保存的聚合结果进行比较,看谁更小</span>
                <span class="token keyword">if</span> <span class="token punctuation">(</span>operationField<span class="token punctuation">.</span><span class="token function">compare</span><span class="token punctuation">(</span><span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span><span class="token punctuation">.</span><span class="token constant">LESS_THAN</span><span class="token punctuation">,</span> field<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token comment">//select MAX(age) from t;</span>
            <span class="token keyword">case</span> <span class="token constant">MAX</span><span class="token operator">:</span>
                <span class="token comment">// 聚合字段的值和当前阶段已经保存的聚合结果进行比较,看谁更大</span>
                <span class="token keyword">if</span> <span class="token punctuation">(</span>operationField<span class="token punctuation">.</span><span class="token function">compare</span><span class="token punctuation">(</span><span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span><span class="token punctuation">.</span><span class="token constant">GREATER_THAN</span><span class="token punctuation">,</span> field<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token comment">//select COUNT(age) from t;</span>
            <span class="token keyword">case</span> <span class="token constant">COUNT</span><span class="token operator">:</span>
                <span class="token comment">// 计数&#43;1</span>
                <span class="token class-name">IntField</span> count <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>field<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> count<span class="token punctuation">)</span><span class="token punctuation">;</span>
                aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token comment">//select SUM(age) from t;</span>
            <span class="token keyword">case</span> <span class="token constant">SUM</span><span class="token operator">:</span>
                <span class="token comment">// 求和</span>
                <span class="token class-name">IntField</span> sum <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>field<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> operationField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> sum<span class="token punctuation">)</span><span class="token punctuation">;</span>
                aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token comment">//select AVG(age) from t;</span>
            <span class="token keyword">case</span> <span class="token constant">AVG</span><span class="token operator">:</span>
                <span class="token comment">// 求平均值,每次往整数聚合器塞入一条记录时,都会将记录数和总和累加</span>
                counts<span class="token operator">&#43;&#43;</span><span class="token punctuation">;</span>
                summary <span class="token operator">&#43;&#61;</span> operationField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">IntField</span> avg <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>summary <span class="token operator">/</span> counts<span class="token punctuation">)</span><span class="token punctuation">;</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> avg<span class="token punctuation">)</span><span class="token punctuation">;</span>
                aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span><span class="token constant">NO_GROUP</span><span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token keyword">default</span><span class="token operator">:</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
        <span class="token comment">// 分组聚合操作:</span>
        <span class="token comment">// 获取分组字段 --&gt; group by age</span>
        <span class="token class-name">Field</span> groupField <span class="token operator">&#61;</span> tup<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span>gbfield<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 如果聚合结果中还不包括当前字段值,说明当前字段是第一次出现</span>
        <span class="token comment">// 例如: group by age --&gt; &lt;age&#61;18,count&#61;20&gt; ,如果此次获取的age&#61;20,那么就是第一次出现的分组值</span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>aggregate<span class="token punctuation">.</span><span class="token function">containsKey</span><span class="token punctuation">(</span>groupField<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token class-name">Tuple</span> value <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc<span class="token punctuation">)</span><span class="token punctuation">;</span>
            value<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> groupField<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token keyword">if</span> <span class="token punctuation">(</span>what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">COUNT</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                value<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>what<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token class-name">Op</span><span class="token punctuation">.</span><span class="token constant">AVG</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                countsMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> countsMap<span class="token punctuation">.</span><span class="token function">getOrDefault</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                sumMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> sumMap<span class="token punctuation">.</span><span class="token function">getOrDefault</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> operationField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                value<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
                <span class="token comment">// 其他的情况: MIN,MAX,SUM在参与聚合的行只存在一个时,结果假设当前行的值</span>
                <span class="token comment">// 所以这里可以统一处理</span>
                value<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
            aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> value<span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token keyword">return</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token comment">// 当前字段不是第一次出现的分组值</span>
        <span class="token class-name">Tuple</span> tuple <span class="token operator">&#61;</span> aggregate<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>groupField<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 获取本阶段的聚合结果</span>
        <span class="token class-name">IntField</span> field <span class="token operator">&#61;</span> <span class="token punctuation">(</span><span class="token class-name">IntField</span><span class="token punctuation">)</span> tuple<span class="token punctuation">.</span><span class="token function">getField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">switch</span> <span class="token punctuation">(</span>what<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token keyword">case</span> <span class="token constant">MIN</span><span class="token operator">:</span>
                <span class="token keyword">if</span> <span class="token punctuation">(</span>operationField<span class="token punctuation">.</span><span class="token function">compare</span><span class="token punctuation">(</span><span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span><span class="token punctuation">.</span><span class="token constant">LESS_THAN</span><span class="token punctuation">,</span> field<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token keyword">case</span> <span class="token constant">MAX</span><span class="token operator">:</span>
                <span class="token keyword">if</span> <span class="token punctuation">(</span>operationField<span class="token punctuation">.</span><span class="token function">compare</span><span class="token punctuation">(</span><span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span><span class="token punctuation">.</span><span class="token constant">GREATER_THAN</span><span class="token punctuation">,</span> field<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
                    tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> operationField<span class="token punctuation">)</span><span class="token punctuation">;</span>
                    aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token keyword">case</span> <span class="token constant">COUNT</span><span class="token operator">:</span>
                <span class="token class-name">IntField</span> count <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>field<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> count<span class="token punctuation">)</span><span class="token punctuation">;</span>
                aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token keyword">case</span> <span class="token constant">SUM</span><span class="token operator">:</span>
                <span class="token class-name">IntField</span> sum <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>field<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> operationField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> sum<span class="token punctuation">)</span><span class="token punctuation">;</span>
                aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token keyword">case</span> <span class="token constant">AVG</span><span class="token operator">:</span>
                countsMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> countsMap<span class="token punctuation">.</span><span class="token function">getOrDefault</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                sumMap<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> sumMap<span class="token punctuation">.</span><span class="token function">getOrDefault</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token operator">&#43;</span> operationField<span class="token punctuation">.</span><span class="token function">getValue</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">IntField</span> avg <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>sumMap<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>groupField<span class="token punctuation">)</span> <span class="token operator">/</span> countsMap<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>groupField<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                tuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> avg<span class="token punctuation">)</span><span class="token punctuation">;</span>
                aggregate<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>groupField<span class="token punctuation">,</span> tuple<span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
            <span class="token keyword">default</span><span class="token operator">:</span>
                <span class="token keyword">return</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> tupleDesc<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Create a OpIterator over group aggregate results.
 *
 * &#64;return a OpIterator whose tuples are the pair (groupVal, aggregateVal)
 * if using group, or a single (aggregateVal) if no grouping. The
 * aggregateVal is determined by the type of aggregate specified in
 * the constructor.
 */</span>
<span class="token keyword">public</span> <span class="token class-name">OpIterator</span> <span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">new</span> <span class="token class-name">IntOpIterator</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">IntOpIterator</span> <span class="token keyword">implements</span> <span class="token class-name">OpIterator</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">private</span> <span class="token class-name">Iterator</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Tuple</span><span class="token punctuation">&gt;</span></span> iterator<span class="token punctuation">;</span>
    <span class="token keyword">private</span> <span class="token class-name">IntegerAggregator</span> aggregator<span class="token punctuation">;</span>

    <span class="token keyword">public</span> <span class="token class-name">IntOpIterator</span><span class="token punctuation">(</span><span class="token class-name">IntegerAggregator</span> aggregator<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>aggregator <span class="token operator">&#61;</span> aggregator<span class="token punctuation">;</span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>iterator <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">this</span><span class="token punctuation">.</span>iterator <span class="token operator">&#61;</span> aggregator<span class="token punctuation">.</span>aggregate<span class="token punctuation">.</span><span class="token function">values</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">boolean</span> <span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> iterator<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token class-name">Tuple</span> <span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">NoSuchElementException</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> iterator<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
        iterator <span class="token operator">&#61;</span> aggregator<span class="token punctuation">.</span>aggregate<span class="token punctuation">.</span><span class="token function">values</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">iterator</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> aggregator<span class="token punctuation">.</span>tupleDesc<span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token annotation punctuation">&#64;Override</span>
    <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        iterator <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

}

完成本节练习之后,需要通过PredicateTest, JoinPredicateTest, FilterTest, JoinTest单元测试;并通过FilterTest和JoinTest系统测试。


练习三 – HeapFile Mutability

本节我们将实现修改数据库表文件的方法,我们从单独的页面和文件开始,主要实现两种操作:增加元组和移除元组

  • 移除元组:为了移除一个元组,我们需要实现deleteTuple方法,元组包含RecordIDs可以帮助我们找到它们存储在哪一页,所以定位到元组对应的page并且正确修改page的headers信息就很简单了
  • 增加元组:HeapFile中的insertTuple方法主要用于向数据库文件添加一个元组。为了向HeapFile中添加一个新的元组,我们需要找到带有空槽的页,如果不存在这样的页,我们需要创造一个新页并且将其添加到磁盘的文件上。我们需要确保元组的RecordID被正确更新

实现如下类中的方法:

  • src/java/simpledb/storage/HeapPage.java
  • src/java/simpledb/storage/HeapFile.java (Note that you do not necessarily need to implement writePage at this point).

为了实现HeapPage,在insertTuple和deleteTuple方法中你需要修改表示header的bitmap;这里将会使用到我们在实验一中实现的getNumEmptySlots()和isSlotUsed方法,markSlotUsed方法是抽象方法,并且用于填充或者清除page header的的状态信息。

注意,insertTuple和deleteTuple方法需要通过BufferPool.getPage方法访问页,否则下一个实验中关于事务的实现将无法正常工作


HeapPage作为数据读写的最小单位,主要负责维护Page数据组织格式和数据读写操作,其内部属性如下所示:

public class HeapPage implements Page {
    final HeapPageId pid;
    final TupleDesc td;
    final byte[] header;
    final Tuple[] tuples;
    final int numSlots;
    byte[] oldData;
    private final Byte oldDataLock = (byte) 0;
    // 本lab新增的两个属性
    private boolean dirty;
    private TransactionId tid;
    ...

本节我们需要在HeapPage中实现的方法主要包括元组的插入,删除以及脏页标记和判脏:

    /**
     * Adds the specified tuple to the page;  the tuple should be updated to reflect
     * that it is now stored on this page.
     *
     * @param t The tuple to add.
     * @throws DbException if the page is full (no empty slots) or tupledesc
     *                     is mismatch.
     */
    public void insertTuple(Tuple t) throws DbException {
        TupleDesc tupleDesc = t.getTupleDesc();
        if (getNumEmptySlots() == 0 || !tupleDesc.equals(this.td)) {
            throw new DbException("this page is full or tupledesc is mismatch");
        }
        for (int i = 0; i < numSlots; i++) {
            if (!isSlotUsed(i)) {
                markSlotUsed(i, true);
                t.setRecordId(new RecordId(this.pid, i));
                tuples[i] = t;
                break;
            }
        }
    }
<span class="token comment">/**
 * Delete the specified tuple from the page; the corresponding header bit should be updated to reflect
 * that it is no longer stored on any page.
 *
 * &#64;param t The tuple to delete
 * &#64;throws DbException if this tuple is not on this page, or tuple slot is
 *                     already empty.
 */</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">deleteTuple</span><span class="token punctuation">(</span><span class="token class-name">Tuple</span> t<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token class-name">RecordId</span> recordId <span class="token operator">&#61;</span> t<span class="token punctuation">.</span><span class="token function">getRecordId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">int</span> slotId <span class="token operator">&#61;</span> recordId<span class="token punctuation">.</span><span class="token function">getTupleNumber</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>recordId<span class="token punctuation">.</span><span class="token function">getPageId</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">!&#61;</span> <span class="token keyword">this</span><span class="token punctuation">.</span>pid <span class="token operator">||</span> <span class="token operator">!</span><span class="token function">isSlotUsed</span><span class="token punctuation">(</span>slotId<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">throw</span> <span class="token keyword">new</span> <span class="token class-name">DbException</span><span class="token punctuation">(</span><span class="token string">&#34;tuple is not in this page&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token comment">// 将tuple对应的slot置为0</span>
    <span class="token function">markSlotUsed</span><span class="token punctuation">(</span>slotId<span class="token punctuation">,</span> <span class="token boolean">false</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token comment">// 将slot对应的tuple置为null</span>
    tuples<span class="token punctuation">[</span>slotId<span class="token punctuation">]</span> <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Marks this page as dirty/not dirty and record that transaction
 * that did the dirtying
 */</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">markDirty</span><span class="token punctuation">(</span><span class="token keyword">boolean</span> dirty<span class="token punctuation">,</span> <span class="token class-name">TransactionId</span> tid<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>dirty <span class="token operator">&#61;</span> dirty<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tid <span class="token operator">&#61;</span> tid<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Returns the tid of the transaction that last dirtied this page, or null if the page is not dirty
 */</span>
<span class="token keyword">public</span> <span class="token class-name">TransactionId</span> <span class="token function">isDirty</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> dirty <span class="token operator">?</span> tid <span class="token operator">:</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

其他辅助的工具方法大家自行查看源码


HeapFile可以看做是表的实体对象,表由一堆HeadPage组成,这一堆HeadPage存放于当前表的DBFile中,这里我们主要实现元组的插入和删除方法:

    // see DbFile.java for javadocs
    public List<Page> insertTuple(TransactionId tid, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        List<Page> modified = new ArrayList<>();
        for (int i = 0; i < numPages(); i++) {
            HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, new HeapPageId(this.getId(), i), Permissions.READ_WRITE);
            if (page.getNumEmptySlots() == 0) {
                continue;
            }
            page.insertTuple(t);
            modified.add(page);
            return modified;
        }
        // 当所有的页都满时,我们需要创建新的页并写入文件中
        BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream(file, true));
        byte[] emptyPageData = HeapPage.createEmptyPageData();
        // 向文件末尾添加数据
        outputStream.write(emptyPageData);
        outputStream.close();
        // 加载到缓存中,使用numPages() - 1是因为此时numPages()已经变为插入后的大小了
        HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, new HeapPageId(getId(), numPages() - 1), Permissions.READ_WRITE);
        page.insertTuple(t);
        modified.add(page);
        return modified;
    }
<span class="token comment">// see DbFile.java for javadocs</span>
<span class="token keyword">public</span> <span class="token class-name">ArrayList</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Page</span><span class="token punctuation">&gt;</span></span> <span class="token function">deleteTuple</span><span class="token punctuation">(</span><span class="token class-name">TransactionId</span> tid<span class="token punctuation">,</span> <span class="token class-name">Tuple</span> t<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span>
        <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token class-name">HeapPage</span> page <span class="token operator">&#61;</span> <span class="token punctuation">(</span><span class="token class-name">HeapPage</span><span class="token punctuation">)</span> bufferPool<span class="token punctuation">.</span><span class="token function">getPage</span><span class="token punctuation">(</span>tid<span class="token punctuation">,</span> t<span class="token punctuation">.</span><span class="token function">getRecordId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getPageId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token class-name">Permissions</span><span class="token punctuation">.</span><span class="token constant">READ_WRITE</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    page<span class="token punctuation">.</span><span class="token function">deleteTuple</span><span class="token punctuation">(</span>t<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">ArrayList</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Page</span><span class="token punctuation">&gt;</span></span> modified <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ArrayList</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    modified<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span>page<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> modified<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

实现BufferPool类中的如下方法:

  • insertTuple()
  • deleteTuple()

这些方法需要调用需要被修改的表的HeapFile中的合适的方法来实现

    public void insertTuple(TransactionId tid, int tableId, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        DbFile dbFile = Database.getCatalog().getDatabaseFile(tableId);
        updateBufferPool(dbFile.insertTuple(tid, t), tid);
    }
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">deleteTuple</span><span class="token punctuation">(</span><span class="token class-name">TransactionId</span> tid<span class="token punctuation">,</span> <span class="token class-name">Tuple</span> t<span class="token punctuation">)</span>
        <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">IOException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token class-name">DbFile</span> dbFile <span class="token operator">&#61;</span> <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getCatalog</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getDatabaseFile</span><span class="token punctuation">(</span>t<span class="token punctuation">.</span><span class="token function">getRecordId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getPageId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getTableId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token function">updateBufferPool</span><span class="token punctuation">(</span>dbFile<span class="token punctuation">.</span><span class="token function">deleteTuple</span><span class="token punctuation">(</span>tid<span class="token punctuation">,</span> t<span class="token punctuation">)</span><span class="token punctuation">,</span> tid<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">private</span> <span class="token keyword">void</span> <span class="token function">updateBufferPool</span><span class="token punctuation">(</span><span class="token class-name">List</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token class-name">Page</span><span class="token punctuation">&gt;</span></span> pages<span class="token punctuation">,</span> <span class="token class-name">TransactionId</span> tid<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token class-name">Page</span> page <span class="token operator">:</span> pages<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        page<span class="token punctuation">.</span><span class="token function">markDirty</span><span class="token punctuation">(</span><span class="token boolean">true</span><span class="token punctuation">,</span> tid<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

完成练习后,我们的代码需要通过HeapPageWriteTest、HeapFileWriteTest和BufferPoolWriteTest单元测试


练习四 – Insertion & deletion

现在我们已经实现了向HeapFile添加和删除元组的机制,接下来就需要实现Insert和Delete操作

为了实现insert和delete查询,我们需要使用Insert和Delete来修改磁盘上的页,这些操作会返回被影响的元组数量

  • Insert:该操作从他的子操作中读取元组加入到构造函数指定的tableid对应的表中,需要调用BufferPool.insertTuple()方法实现
  • Delete:该操作从构造函数的tableid找到对应的table,并删除子操作中的元组,需要调用BufferPool.deleteTuple方法实现

实现如下类中的方法:

  • src/java/simpledb/execution/Insert.java
  • src/java/simpledb/execution/Delete.java

Insert和Delete采用的也是装饰器模式,所以这里不再多讲:

  • Insert操作
/**
 * Inserts tuples read from the child operator into the tableId specified in the
 * constructor
 */
public class Insert extends Operator {
    private static final long serialVersionUID = 1L;
    private final TransactionId tid;
    private OpIterator child;
    private final int tableId;
    private final TupleDesc tupleDesc;
    private Tuple insertTuple;
<span class="token comment">/**
 * Constructor.
 *
 * &#64;param t       The transaction running the insert.
 * &#64;param child   The child operator from which to read tuples to be inserted.
 * &#64;param tableId The table in which to insert tuples.
 * &#64;throws DbException if TupleDesc of child differs from table into which we are to
 *                     insert.
 */</span>
<span class="token keyword">public</span> <span class="token class-name">Insert</span><span class="token punctuation">(</span><span class="token class-name">TransactionId</span> t<span class="token punctuation">,</span> <span class="token class-name">OpIterator</span> child<span class="token punctuation">,</span> <span class="token keyword">int</span> tableId<span class="token punctuation">)</span>
        <span class="token keyword">throws</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tid <span class="token operator">&#61;</span> t<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child <span class="token operator">&#61;</span> child<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tableId <span class="token operator">&#61;</span> tableId<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;insertNums&#34;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>insertTuple <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    child<span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    child<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    child<span class="token punctuation">.</span><span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Inserts tuples read from child into the tableId specified by the
 * constructor. It returns a one field tuple containing the number of
 * inserted records. Inserts should be passed through BufferPool. An
 * instances of BufferPool is available via Database.getBufferPool(). Note
 * that insert DOES NOT need check to see if a particular tuple is a
 * duplicate before inserting it.
 *
 * &#64;return A 1-field tuple containing the number of inserted records, or
 * null if called more than once.
 * &#64;see Database#getBufferPool
 * &#64;see BufferPool#insertTuple
 */</span>
<span class="token keyword">protected</span> <span class="token class-name">Tuple</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>insertTuple <span class="token operator">!&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token class-name">BufferPool</span> bufferPool <span class="token operator">&#61;</span> <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getBufferPool</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">int</span> insertTuples <span class="token operator">&#61;</span> <span class="token number">0</span><span class="token punctuation">;</span>
    <span class="token keyword">while</span> <span class="token punctuation">(</span>child<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">try</span> <span class="token punctuation">{<!-- --></span>
            bufferPool<span class="token punctuation">.</span><span class="token function">insertTuple</span><span class="token punctuation">(</span>tid<span class="token punctuation">,</span> tableId<span class="token punctuation">,</span> child<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            insertTuples<span class="token operator">&#43;&#43;</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span> <span class="token keyword">catch</span> <span class="token punctuation">(</span><span class="token class-name">IOException</span> e<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            e<span class="token punctuation">.</span><span class="token function">printStackTrace</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
    <span class="token comment">//返回的是插入的元组数量</span>
    insertTuple <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span><span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc<span class="token punctuation">)</span><span class="token punctuation">;</span>
    insertTuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>insertTuples<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> insertTuple<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token function">getChildren</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">new</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span>child<span class="token punctuation">}</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">setChildren</span><span class="token punctuation">(</span><span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child <span class="token operator">&#61;</span> children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

}

装饰器模式要点有两个:

  • 装饰器对象继承被装饰对象的抽象父类或者父类接口,这样我们才可以在使用时能够用基类指针接收被装饰后的对象实现
  • 装饰器对象内部需要调用被装饰对象的方法获取原数据,然后再此基础上进行计算然后返回一个结果,或者在原有数据基础上增加附加信息,或者啥也不干,只进行相关信息记录。
  • fetchNext方法这里就是Insert装饰器对象需要实现的方法,其内部调用被装饰器对象的next方法获取所有数据,然后执行insert操作,同时计算插入数据条数,最终返回的是插入的数据条数。

  • delete操作
/**
 * The delete operator. Delete reads tuples from its child operator and removes
 * them from the table they belong to.
 */
public class Delete extends Operator {
<span class="token keyword">private</span> <span class="token keyword">static</span> <span class="token keyword">final</span> <span class="token keyword">long</span> serialVersionUID <span class="token operator">&#61;</span> <span class="token number">1L</span><span class="token punctuation">;</span>
<span class="token keyword">private</span> <span class="token keyword">final</span> <span class="token class-name">TransactionId</span> tid<span class="token punctuation">;</span>
<span class="token keyword">private</span> <span class="token class-name">OpIterator</span> child<span class="token punctuation">;</span>
<span class="token keyword">private</span> <span class="token keyword">final</span> <span class="token class-name">TupleDesc</span> tupleDesc<span class="token punctuation">;</span>
<span class="token keyword">private</span> <span class="token class-name">Tuple</span> deleteTuple<span class="token punctuation">;</span>

<span class="token comment">/**
 * Constructor specifying the transaction that this delete belongs to as
 * well as the child to read from.
 *
 * &#64;param t     The transaction this delete runs in
 * &#64;param child The child operator from which to read tuples for deletion
 */</span>
<span class="token keyword">public</span> <span class="token class-name">Delete</span><span class="token punctuation">(</span><span class="token class-name">TransactionId</span> t<span class="token punctuation">,</span> <span class="token class-name">OpIterator</span> child<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tid <span class="token operator">&#61;</span> t<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child <span class="token operator">&#61;</span> child<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;deleteNums&#34;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>deleteTuple <span class="token operator">&#61;</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token class-name">TupleDesc</span> <span class="token function">getTupleDesc</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">this</span><span class="token punctuation">.</span>tupleDesc<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    child<span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">super</span><span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    child<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span><span class="token punctuation">,</span> <span class="token class-name">TransactionAbortedException</span> <span class="token punctuation">{<!-- --></span>
    child<span class="token punctuation">.</span><span class="token function">rewind</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Deletes tuples as they are read from the child operator. Deletes are
 * processed via the buffer pool (which can be accessed via the
 * Database.getBufferPool() method.
 *
 * &#64;return A 1-field tuple containing the number of deleted records.
 * &#64;see Database#getBufferPool
 * &#64;see BufferPool#deleteTuple
 */</span>
<span class="token keyword">protected</span> <span class="token class-name">Tuple</span> <span class="token function">fetchNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>deleteTuple <span class="token operator">!&#61;</span> <span class="token keyword">null</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">return</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token class-name">BufferPool</span> bufferPool <span class="token operator">&#61;</span> <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getBufferPool</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">int</span> deleteNums <span class="token operator">&#61;</span> <span class="token number">0</span><span class="token punctuation">;</span>
    <span class="token keyword">while</span> <span class="token punctuation">(</span>child<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">try</span> <span class="token punctuation">{<!-- --></span>
            bufferPool<span class="token punctuation">.</span><span class="token function">deleteTuple</span><span class="token punctuation">(</span>tid<span class="token punctuation">,</span> child<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            deleteNums<span class="token operator">&#43;&#43;</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span> <span class="token keyword">catch</span> <span class="token punctuation">(</span><span class="token class-name">IOException</span> e<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            e<span class="token punctuation">.</span><span class="token function">printStackTrace</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
    deleteTuple <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Tuple</span><span class="token punctuation">(</span>tupleDesc<span class="token punctuation">)</span><span class="token punctuation">;</span>
    deleteTuple<span class="token punctuation">.</span><span class="token function">setField</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span>deleteNums<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">return</span> deleteTuple<span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token function">getChildren</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">return</span> <span class="token keyword">new</span> <span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span>child<span class="token punctuation">}</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">setChildren</span><span class="token punctuation">(</span><span class="token class-name">OpIterator</span><span class="token punctuation">[</span><span class="token punctuation">]</span> children<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>child <span class="token operator">&#61;</span> children<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

}

完成实验后需要通过InsertTest单元测试,并且通过InsertTest和DeleteTest系统测试


练习五 – Page eviction

在实验一中,我们没有正确的根据BufferPool构造函数中定义的numPages对BufferPool中缓存的最大页面数量进行限制,本节我们将实现拒绝策略。

当缓冲池中存在超过numPages数量的页时,我们需要在加载下一个页时选择淘汰缓冲池中现存的一个页;具体的拒绝策略我们自己选择即可。

BufferPool中包含一个flushAllPages方法,该方法不会被实际用到,只是用来进行实际的测试,我们在实际代码中不会调用此方法。

flushAllPages方法需要调用flushPage方法,并且flushPage方法需要在page离开BufferPool时将脏页写入磁盘,并且将其置为非脏。

从缓冲池中移除页面的唯一方法是evictPage,当任何脏页被丢弃时,我们需要调用flushPage方法来将其刷新到磁盘。

如果学过操作系统,那么应该了解过缓存页面丢弃策略,主要有先进先出(FIFO)、最近最少使用(LRU)和最不常用(LFU)这几种方法,我们可以选择不同的策略实现。我这里给定了一个抽象的接口,定义好方法,最后实现了FIFO和LRU页面丢弃策略,详情请看代码。

实现BufferPool的页面丢弃策略:

  • src/java/simpledb/storage/BufferPool.java

我们需要实现discardPage方法去移除缓冲池中没有被刷新到磁盘上的页,本次实验不会使用该方法,但是它是未来的实验所必须的。


页面淘汰采用策略模式进行实现,这里只展示FIFO策略的实现,LRU可以采用哈希链表实现,具体可以参考Lab2源代码中的LRUEvict类:

public interface EvictStrategy {
<span class="token comment">/**
 * 修改对应的数据结构以满足丢弃策略
 * &#64;param pageId
 */</span>
<span class="token keyword">void</span> <span class="token function">addPage</span><span class="token punctuation">(</span><span class="token class-name">PageId</span> pageId<span class="token punctuation">)</span><span class="token punctuation">;</span>

<span class="token comment">/**
 * 获取要丢弃的Page的PageId信息,用于丢弃
 * &#64;return  PageId
 */</span>
<span class="token class-name">PageId</span> <span class="token function">getEvictPageId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

}

public class FIFOEvict implements EvictStrategy {
/**
* 存储数据的队列
*/

private final Queue<PageId> queue;

<span class="token keyword">public</span> <span class="token class-name">FIFOEvict</span><span class="token punctuation">(</span><span class="token keyword">int</span> numPages<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>queue <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ArrayDeque</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span>numPages<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">addPage</span><span class="token punctuation">(</span><span class="token class-name">PageId</span> pageId<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// 向尾部插入元素</span>
    <span class="token keyword">boolean</span> offer <span class="token operator">&#61;</span> queue<span class="token punctuation">.</span><span class="token function">offer</span><span class="token punctuation">(</span>pageId<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span>offer<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">&#34;PageId: &#34;</span> <span class="token operator">&#43;</span> pageId <span class="token operator">&#43;</span> <span class="token string">&#34; 插入队列成功&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span>
        <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">&#34;PageId: &#34;</span> <span class="token operator">&#43;</span> pageId <span class="token operator">&#43;</span> <span class="token string">&#34; 插入队列失败&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>

<span class="token annotation punctuation">&#64;Override</span>
<span class="token keyword">public</span> <span class="token class-name">PageId</span> <span class="token function">getEvictPageId</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// 从队列头部获取元素</span>
    <span class="token keyword">return</span> queue<span class="token punctuation">.</span><span class="token function">poll</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

}

借助淘汰策略接口和实现类,完成BufferPool中关于flushPage和evitPage相关方法:

    private final EvictStrategy evict;
<span class="token keyword">public</span> <span class="token class-name">BufferPool</span><span class="token punctuation">(</span><span class="token keyword">int</span> numPages<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>numPages <span class="token operator">&#61;</span> numPages<span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>pageCache <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">ConcurrentHashMap</span><span class="token generics"><span class="token punctuation">&lt;</span><span class="token punctuation">&gt;</span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">this</span><span class="token punctuation">.</span>evict <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">FIFOEvict</span><span class="token punctuation">(</span>numPages<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
 
    <span class="token comment">/**
 * Flush all dirty pages to disk.
 * NB: Be careful using this routine -- it writes dirty data to disk so will
 * break simpledb if running in NO STEAL mode.
 */</span>
<span class="token keyword">public</span> <span class="token keyword">synchronized</span> <span class="token keyword">void</span> <span class="token function">flushAllPages</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">IOException</span> <span class="token punctuation">{<!-- --></span>
    pageCache<span class="token punctuation">.</span><span class="token function">forEach</span><span class="token punctuation">(</span><span class="token punctuation">(</span>pageId<span class="token punctuation">,</span> page<span class="token punctuation">)</span> <span class="token operator">-&gt;</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">try</span> <span class="token punctuation">{<!-- --></span>
            <span class="token function">flushPage</span><span class="token punctuation">(</span>pageId<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span> <span class="token keyword">catch</span> <span class="token punctuation">(</span><span class="token class-name">IOException</span> e<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            e<span class="token punctuation">.</span><span class="token function">printStackTrace</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span> 

<span class="token comment">/**
 * Remove the specific page id from the buffer pool.
 * Needed by the recovery manager to ensure that the
 * buffer pool doesn&#39;t keep a rolled back page in its
 * cache.
 * &lt;p&gt;
 * Also used by B&#43; tree files to ensure that deleted pages
 * are removed from the cache so they can be reused safely
 */</span>
 <span class="token keyword">public</span> <span class="token keyword">synchronized</span> <span class="token keyword">void</span> <span class="token function">discardPage</span><span class="token punctuation">(</span><span class="token class-name">PageId</span> pid<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    pageCache<span class="token punctuation">.</span><span class="token function">remove</span><span class="token punctuation">(</span>pid<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Flushes a certain page to disk
 *
 * &#64;param pid an ID indicating the page to flush
 */</span>
<span class="token keyword">private</span> <span class="token keyword">synchronized</span> <span class="token keyword">void</span> <span class="token function">flushPage</span><span class="token punctuation">(</span><span class="token class-name">PageId</span> pid<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">IOException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token class-name">Page</span> flush <span class="token operator">&#61;</span> pageCache<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>pid<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token comment">// 通过tableId找到对应的DbFile,并将page写入到对应的DbFile中</span>
    <span class="token keyword">int</span> tableId <span class="token operator">&#61;</span> pid<span class="token punctuation">.</span><span class="token function">getTableId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">DbFile</span> dbFile <span class="token operator">&#61;</span> <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getCatalog</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getDatabaseFile</span><span class="token punctuation">(</span>tableId<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token comment">// 将page刷新到磁盘</span>
    dbFile<span class="token punctuation">.</span><span class="token function">writePage</span><span class="token punctuation">(</span>flush<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

<span class="token comment">/**
 * Discards a page from the buffer pool.
 * Flushes the page to disk to ensure dirty pages are updated on disk.
 */</span>
<span class="token keyword">private</span> <span class="token keyword">synchronized</span> <span class="token keyword">void</span> <span class="token function">evictPage</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token class-name">PageId</span> evictPageId <span class="token operator">&#61;</span> evict<span class="token punctuation">.</span><span class="token function">getEvictPageId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token keyword">try</span> <span class="token punctuation">{<!-- --></span>
        <span class="token function">flushPage</span><span class="token punctuation">(</span>evictPageId<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span> <span class="token keyword">catch</span> <span class="token punctuation">(</span><span class="token class-name">IOException</span> e<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        e<span class="token punctuation">.</span><span class="token function">printStackTrace</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    pageCache<span class="token punctuation">.</span><span class="token function">remove</span><span class="token punctuation">(</span>evictPageId<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
 
<span class="token keyword">public</span> <span class="token class-name">Page</span> <span class="token function">getPage</span><span class="token punctuation">(</span><span class="token class-name">TransactionId</span> tid<span class="token punctuation">,</span> <span class="token class-name">PageId</span> pid<span class="token punctuation">,</span> <span class="token class-name">Permissions</span> perm<span class="token punctuation">)</span>
        <span class="token keyword">throws</span> <span class="token class-name">TransactionAbortedException</span><span class="token punctuation">,</span> <span class="token class-name">DbException</span> <span class="token punctuation">{<!-- --></span>
    <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>pageCache<span class="token punctuation">.</span><span class="token function">containsKey</span><span class="token punctuation">(</span>pid<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        <span class="token keyword">if</span> <span class="token punctuation">(</span>pageCache<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> numPages<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token function">evictPage</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        <span class="token class-name">DbFile</span> dbFile <span class="token operator">&#61;</span> <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getCatalog</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getDatabaseFile</span><span class="token punctuation">(</span>pid<span class="token punctuation">.</span><span class="token function">getTableId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">Page</span> page <span class="token operator">&#61;</span> dbFile<span class="token punctuation">.</span><span class="token function">readPage</span><span class="token punctuation">(</span>pid<span class="token punctuation">)</span><span class="token punctuation">;</span>
        pageCache<span class="token punctuation">.</span><span class="token function">put</span><span class="token punctuation">(</span>pid<span class="token punctuation">,</span> page<span class="token punctuation">)</span><span class="token punctuation">;</span>
        evict<span class="token punctuation">.</span><span class="token function">addPage</span><span class="token punctuation">(</span>pid<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">return</span> pageCache<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span>pid<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

完成练习之后,代码需要通过EvictionTest单元测试。

至此我们就完成本次实验了,接下来还有对实验内容的其他测试。


练习六 – Query walkthrough

通过我们实现的各种查询策略,来执行类似于下面SQL语句的联合查询:

SELECT *
FROM some_data_file1,
     some_data_file2
WHERE some_data_file1.field1 = some_data_file2.field1
  AND some_data_file1.id > 1

我们需要根据实验一中的方法创建两个数据库文件some_data_file1.dat和some_data_file2.dat,然后使用如下代码进行测试:
在这里插入图片描述
运行下面这个测试,可以得到2,2,3,3,2,4和3,3,4,5,3,7两条结果:

public class JoinTest {
<span class="token comment">/**
 *  select * from t1,t2 where t1.f0 &gt; 1 and t1.f1 &#61; t2.f1 ;
 */</span>
<span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> main <span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
    <span class="token comment">// construct a 3-column table schema</span>
    <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span> types <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Type</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">,</span> <span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">,</span> <span class="token class-name">Type</span><span class="token punctuation">.</span><span class="token constant">INT_TYPE</span><span class="token punctuation">}</span><span class="token punctuation">;</span>
    <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> names <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">{<!-- --></span><span class="token string">&#34;f0&#34;</span><span class="token punctuation">,</span> <span class="token string">&#34;f1&#34;</span><span class="token punctuation">,</span> <span class="token string">&#34;f2&#34;</span><span class="token punctuation">}</span><span class="token punctuation">;</span>

    <span class="token class-name">TupleDesc</span> td <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TupleDesc</span><span class="token punctuation">(</span>types<span class="token punctuation">,</span> names<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// create the tables, associate them with the data files</span>
    <span class="token comment">// and tell the catalog about the schema  the tables.</span>
    <span class="token class-name">HeapFile</span> table1 <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">HeapFile</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">File</span><span class="token punctuation">(</span><span class="token string">&#34;some_data_file1.dat&#34;</span><span class="token punctuation">)</span><span class="token punctuation">,</span> td<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getCatalog</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">addTable</span><span class="token punctuation">(</span>table1<span class="token punctuation">,</span> <span class="token string">&#34;t1&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token class-name">HeapFile</span> table2 <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">HeapFile</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">File</span><span class="token punctuation">(</span><span class="token string">&#34;some_data_file2.dat&#34;</span><span class="token punctuation">)</span><span class="token punctuation">,</span> td<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getCatalog</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">addTable</span><span class="token punctuation">(</span>table2<span class="token punctuation">,</span> <span class="token string">&#34;t2&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// construct the query: we use two SeqScans, which spoonfeed</span>
    <span class="token comment">// tuples via iterators into join</span>
    <span class="token class-name">TransactionId</span> tid <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">TransactionId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token class-name">SeqScan</span> ss1 <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">SeqScan</span><span class="token punctuation">(</span>tid<span class="token punctuation">,</span> table1<span class="token punctuation">.</span><span class="token function">getId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">&#34;t1&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">SeqScan</span> ss2 <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">SeqScan</span><span class="token punctuation">(</span>tid<span class="token punctuation">,</span> table2<span class="token punctuation">.</span><span class="token function">getId</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token string">&#34;t2&#34;</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// create a filter for the where condition</span>
    <span class="token class-name">Filter</span> sf1 <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Filter</span><span class="token punctuation">(</span>
            <span class="token keyword">new</span> <span class="token class-name">Predicate</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span>
                    <span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span><span class="token punctuation">.</span><span class="token constant">GREATER_THAN</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">IntField</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ss1<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token class-name">JoinPredicate</span> p <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">JoinPredicate</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token class-name">Predicate<span class="token punctuation">.</span>Op</span><span class="token punctuation">.</span><span class="token constant">EQUALS</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token class-name">Join</span> j <span class="token operator">&#61;</span> <span class="token keyword">new</span> <span class="token class-name">Join</span><span class="token punctuation">(</span>p<span class="token punctuation">,</span> sf1<span class="token punctuation">,</span> ss2<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token comment">// and run it</span>
    <span class="token keyword">try</span> <span class="token punctuation">{<!-- --></span>
        j<span class="token punctuation">.</span><span class="token keyword">open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token keyword">while</span> <span class="token punctuation">(</span>j<span class="token punctuation">.</span><span class="token function">hasNext</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
            <span class="token class-name">Tuple</span> tup <span class="token operator">&#61;</span> j<span class="token punctuation">.</span><span class="token function">next</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span>tup<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token punctuation">}</span>
        j<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">Database</span><span class="token punctuation">.</span><span class="token function">getBufferPool</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">transactionComplete</span><span class="token punctuation">(</span>tid<span class="token punctuation">)</span><span class="token punctuation">;</span>

    <span class="token punctuation">}</span> <span class="token keyword">catch</span> <span class="token punctuation">(</span><span class="token class-name">Exception</span> e<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
        e<span class="token punctuation">.</span><span class="token function">printStackTrace</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

<span class="token punctuation">}</span>

}

在这里插入图片描述


练习七 - 查询解析

本节我们将会使用SimpleDB中已经编写好的SQL解析器来实现基于SQL语句的查询

首先我们需要创建数据库表和数据库目录,其中数据库表data.txt的内容如下:

1,10
2,20
3,30
4,40
5,50
5,50

通过如下命令将其转换为二进制文件:

java -jar dist/simpledb.jar convert data.txt 2 "int,int"

接下来创建数据库目录文件catalog.txt:

data (f1 int, f2 int)

该文件会告诉SimpleDB数据库中包含一个表:data,其结构为两个int类型的列

最后,我们运行如下命令:

java -jar dist/simpledb.jar parser catalog.txt

可以看到如下输出:

Added table : data with schema INT_TYPE(f1), INT_TYPE(f2)
Computing table stats.
Done.
SimpleDB>

接着输入SQL语句即可进行查询:

SimpleDB> select d.f1, d.f2 from data d;
Started a new transaction tid = 0
Added scan of table d
Added select list field d.f1
Added select list field d.f2
The query plan is:
  π(d.f1,d.f2),card:0
  |
scan(data d)

d.f1 d.f2

1 10

2 20

3 30

4 40

5 50

5 50

6 rows.
Transaction
0 committed.

0.10 seconds

如果没有报错的话,证明你的相关实现都是正确的

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值