Spark-Sql源码解析之七 Execute: executed Plan -> RDD[Row]

博客深入探讨了SparkPlan如何执行并转化为RDD[Row]的过程。通过分析`execute`函数和`doExecute`方法,特别是针对`select SUM(id) from test group by dev_chnid`语句,展示了从Exchange到PhysicalRDD的转换。在PhysicalRDD中,`doExecute`直接返回构建的rdd,这个rdd是由`buildScan`生成的,对于Spark 1.4.0中的Parquet文件,对应的是`ParquetRelation2`。整个执行过程涉及到局部和全局聚合,将job 0拆分为两个Stage:stage 0和stage 1。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

SparkPlan如何执行呢,SparkPlan是如何转变为RDD[Row]的呢?首先看一段代码:

SQLContext sqlContext = new SQLContext(jsc);
DataFrame dataFrame = sqlContext.parquetFile(parquetPath);
dataFrame.registerTempTable(source);
String sql = " select SUM(id) from test group by dev_chnid ";
DataFrame result = sqlContext.sql(sql);
log.info("Result:"+result.collect());//collect触发action

override def collect(): Array[Row] = {
  val ret = queryExecution.executedPlan.executeCollect()//执行executedPlan的executeCollect
  ret
}
def executeCollect(): Array[Row] = {
  execute().mapPartitions { iter =>
    val converter = CatalystTypeConverters.createToScalaConverter(schema)
    iter.map(converter(_).asInstanceOf[Row])
  }.collect()//最终执行的是executedPlan的execute,即SparkPlan的execute
}
def collect(): Array[T] = withScope {
  val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
  Array.concat(results: _*)
}

查看SparkPlan的execute函数:

abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable {
  ……
final def execute(): RDD[Row] = {
    RDDOperationScope.withScope(sparkContext, nodeName, false, true) {
      doExecute()//执行各个具体SparkPlan的doExecute函数
    }
  }
……
}

可以每个具体的SparkPlan都会封装一个doExecute函数,其输出为RDD[Row]。就拿select SUM(id) from test group by dev_chnid语句来说,其executePlan为:

Aggregate false, [dev_chnid#0], [CombineSum(PartialSum#45L) AS c0#43L]
 Exchange (HashPartitioning 200)
  Aggregate true, [dev_chnid#0], [dev_chnid#0,SUM(id#17L) AS PartialSum#45L]
   PhysicalRDD [dev_chnid#0,id#17L], MapPartitionsRDD

先看下Aggregatefalse, [dev_chnid#0], [CombineSum(PartialSum#45L) AS c0#43L]的doExecute的函数:

protected override def doExecute(): RDD[Row] = attachTree(this, "execute") {
  if (groupingExpressions.isEmpty) {//如果没有分组
    child.execute().mapPartitions { iter =>//执行child的execute函数
      val buffer = newAggregateBuffer()
      var currentRow: Row = null
      while (iter.hasNext) {
        currentRow = iter.next()
        var i = 0
        while (i < buffer.length) {//计算全局的值
          buffer(i).update(currentRow)
          i += 1
        }
      }
      val resultProjection = new InterpretedProjection(resultExpressions, computedSchema)
      val aggregateResults = new GenericMutableRow(computedAggregates.length)

      var i = 0
      while (i < buffer.length) {
        aggregateResults(i) = buffer(i).eval(EmptyRow)
        i += 1
      }

      Iterator(resultProjection(aggregateResults))
    }
  } else {
    child.execute().mapPartitions { iter =>//执行child的execute函数
      val hashTable = new HashMap[Row, Array[AggregateFunction]]
      //groupingExpressions = [dev_chnid#0]
      //child.output = [dev_chnid#0,id#17L]
      val groupingProjection = new InterpretedMutableProjection(groupin
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值