presto代码解析一

最新推荐文章于 2025-07-28 13:53:55 发布

bigdatar

最新推荐文章于 2025-07-28 13:53:55 发布

阅读量6.2k

点赞数

CC 4.0 BY-SA版权

分类专栏： presto 文章标签： presto

本文链接：https://blog.youkuaiyun.com/sinat_27545249/article/details/52450689

本文详细解析了Presto中的TaskExecutor，包括Runner的run方法流程，以及几种重要的操作符如ScanFilterAndProjectOperator（已单独介绍）、HashAggregationOperator、FilterAndProjectOperator、TaskOutputOperator、ExchangeOperator、OrderByOperator和TopNOperator的工作原理和关键方法。通过这些内容，深入理解Presto的任务执行和数据处理过程。

TaskExecutor
- 重要类介绍
- run方法流程介绍
操作符介绍

1.TaskExecutor

    一个TaskExecutor负责执行多个split的实际操作，首先会构造一个线程池，在线程池中预制多个线程。

    public synchronized void start()
    {
        checkState(!closed, "TaskExecutor is closed");
        for (int i = 0; i < runnerThreads; i++) {
            addRunnerThread();
        }
    }

Runner是实际的线程类，继承了runnable接口，其run方法实现了比较复杂的操作

1 重要类介绍

PrioritizedSplitRunner

1 run方法流程介绍

 首先从pendingSplits中获取到一个被挂起的还没有线程资源的split，把这个split加入到runningSplits中，然后调用该split的

split的数据，处理的流程是“并行遍历”这个driver的operators集合，这个operator集合中包含了处理这个split数据的所有操作符。
在构造完一个线程后，就会尝试去pendingSplits中获取split，这是一个priorityBlockingQuene，如果没有元素，就会阻塞。

2 操作符介绍

2 ScanFilterAndProjectOperator(已挪至专题文章进行介绍)

 这个操作符用于扫描原始数据，从操作符的名字可以看出，这个操作符其实可以完成三部分工作：扫描+过滤+投影

1 重要变量包括：

       1)) CursorProcessor cursorProcessor
        代码生成，用于快速扫描数据源
       2)) PageProcessor pageProcessor
        代码生成，用于快速处理一个page内的数据
       3)) boolean finishing
         用于标示是否扫面完split内的全部数据

2 重要方法包括：

       1)) Page getOutput()

public Page getOutput()
    {
        if (!finishing) {
            createSourceIfNecessary(); //根据不同的connector创建数据源

            if (cursor != null) {
                int rowsProcessed = cursorProcessor.process(operatorContext.getSession().toConnectorSession(), cursor, ROWS_PER_PAGE, pageBuilder);//使用代码生成扫描数据源，迭代读取过程

                pageSourceMemoryContext.setBytes(cursor.getSystemMemoryUsage());

                long bytesProcessed = cursor.getCompletedBytes() - completedBytes;
                long elapsedNanos = cursor.getReadTimeNanos() - readTimeNanos;
                operatorContext.recordGeneratedInput(bytesProcessed, rowsProcessed, elapsedNanos);
                completedBytes = cursor.getCompletedBytes();
                readTimeNanos = cursor.getReadTimeNanos();

                if (rowsProcessed == 0) {
                    finishing = true;
                }
            }
            else {
                if (currentPage == null) {
                    currentPage = pageSource.getNextPage();

                    if (currentPage != null) {
                        // update operator stats
                        long endCompletedBytes = pageSource.getCompletedBytes();
                        long endReadTimeNanos = pageSource.getReadTimeNanos();
                        operatorContext.recordGeneratedInput(endCompletedBytes - completedBytes, currentPage.getPositionCount(), endReadTimeNanos - readTimeNanos);
                        completedBytes = endCompletedBytes;
                        readTimeNanos = endReadTimeNanos;
                    }

                    currentPosition =