SparkSQL05_prepareForExecution_01_EnsureRequirements

本文解析了Spark SQL中Exchange物理计划的插入原理及其实现细节,包括确保子节点数据分布符合要求的过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1. prepareForExecution

SQLContext的prepareForExecution变量是实现了RuleExecutor接口的匿名内部类对象,用于对物理计划进行转换,代码是


  /**
   * Prepares a planned SparkPlan for execution by inserting shuffle operations and internal
   * row format conversions as needed.
   */
  @transient
  protected[sql] val prepareForExecution = new RuleExecutor[SparkPlan] {
    val batches = Seq(
      Batch("Add exchange", Once, EnsureRequirements(self)),
      Batch("Add row converters", Once, EnsureRowFormats)
    )
  }


EnsureRequirement用于在父子物理计划之间插入一个Exchange物理计划,通过插入一个Exchange物理计划,实现插入shuffle操作的目的。为什么要插入一个Exchange物理计划呢?即插入shuffle操作的目的是什么呢?


2. 实例

给定下面的SQL语句,物理机计划将使用SortMergeJoin进行join操作,


    val df = sqlContext.sql("select * from TBL_STUDENT a  join TBL_CLASS  b where a.classId  = b.classId")
    df.show



该SQL语句产生的执行计划如下,从最后Prepared Physical Plan中可以看出,在SortMergeJoin和PhysicalRDD两个物理计划之间插入了TungstenSort和TungstenExchange两个物理计划,其中TungstenExchange是TungstenSort的child

== Parsed Logical Plan ==
'nodeName: <Project>, argString:< [unresolvedalias(*)]>
  'nodeName: <Filter>, argString:< (UnresolvedAttribute: 'a.classId = UnresolvedAttribute: 'b.classId)>
    'nodeName: <Join>, argString:< Inner, None>
      'nodeName: <UnresolvedRelation>, argString:< [TBL_STUDENT], Some(a)>
      'nodeName: <UnresolvedRelation>, argString:< [TBL_CLASS], Some(b)>

== Analyzed Logical Plan ==
id: string, name: string, classId: string, age: int, classId: string, className: string
nodeName: <Project>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3,AttributeReference:classId#4,AttributeReference:className#5]>
  nodeName: <Filter>, argString:< (AttributeReference:classId#2 = AttributeReference:classId#4)>
    nodeName: <Join>, argString:< Inner, None>
      nodeName: <Subquery>, argString:< a>
        nodeName: <Subquery>, argString:< TBL_STUDENT>
          nodeName: <LogicalRDD>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3], MapPartitionsRDD[4] at main at NativeMethodAccessorImpl.java:-2>
      nodeName: <Subquery>, argString:< b>
        nodeName: <Subquery>, argString:< TBL_CLASS>
          nodeName: <LogicalRDD>, argString:< [AttributeReference:classId#4,AttributeReference:className#5], MapPartitionsRDD[9] at main at NativeMethodAccessorImpl.java:-2>

== Optimized Logical Plan ==
nodeName: <Project>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3,AttributeReference:classId#4,AttributeReference:className#5]>
  nodeName: <Join>, argString:< Inner, Some((AttributeReference:classId#2 = AttributeReference:classId#4))>
    nodeName: <LogicalRDD>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3], MapPartitionsRDD[4] at main at NativeMethodAccessorImpl.java:-2>
    nodeName: <LogicalRDD>, argString:< [AttributeReference:classId#4,AttributeReference:className#5], MapPartitionsRDD[9] at main at NativeMethodAccessorImpl.java:-2>

== Not Prepared Physical Plan ==
nodeName: <TungstenProject>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3,AttributeReference:classId#4,AttributeReference:className#5]>
  nodeName: <SortMergeJoin>, argString:< [AttributeReference:classId#2], [AttributeReference:classId#4]>
    Scan PhysicalRDD[AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3]
    Scan PhysicalRDD[AttributeReference:classId#4,AttributeReference:className#5]

== Prepared Physical Plan ==
nodeName: <TungstenProject>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3,AttributeReference:classId#4,AttributeReference:className#5]>
  nodeName: <SortMergeJoin>, argString:< [AttributeReference:classId#2], [AttributeReference:classId#4]>
    nodeName: <TungstenSort>, argString:< [AttributeReference:classId#2 ASC], false, 0>
      nodeName: <TungstenExchange>, argString:< hashpartitioning(AttributeReference:classId#2)>
        nodeName: <ConvertToUnsafe>, argString:< >
          Scan PhysicalRDD[AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3]
    nodeName: <TungstenSort>, argString:< [AttributeReference:classId#4 ASC], false, 0>
      nodeName: <TungstenExchange>, argString:< hashpartitioning(AttributeReference:classId#4)>
        nodeName: <ConvertToUnsafe>, argString:< >
          Scan PhysicalRDD[AttributeReference:classId#4,AttributeReference:className#5]


3. 插入Exchange物理计划的动机

首先看EnsureRequirements这个case class的类注释,从中发现插入Exchange的原因


/**
 * Ensures that the [[org.apache.spark.sql.catalyst.plans.physical.Partitioning Partitioning]]
 * of input data meets the
 * [[org.apache.spark.sql.catalyst.plans.physical.Distribution Distribution]] requirements for
 * each operator by inserting [[Exchange]] Operators where required.  Also ensure that the
 * input partition ordering requirements are met.
 */
private[sql] case class EnsureRequirements(sqlContext: SQLContext) extends Rule[SparkPlan] {

EnsureRequirement的类注释说明了两点

a. 为什么要插入Exchange,对于Tungsten mode,就是插入TungstenExchange

b.为什么要插入Sort,不同的情况,插入不同的Sort运算符,比如TungstenSort, ExternalSort以及Sort


为什么要插入Exchange这个物理计划

假如有两个物理计划,B和C是A的children(可以把A想象成SortMergeJoin,B和C想象成两个PhysicalRDD), A期望它的children(B和C)的数据分布(Distribution)满足一定的要求,这个要求通过SparkPlan的requiredChildDistribution函数定义。对于SortMergeJoin来说,就是要求两个child的数据分布都是ClusteredDistribution类型,代码如下

  override def requiredChildDistribution: Seq[Distribution] =
    ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil

而每个child都有一个数据分区策略,这个数据分区策略通过SparkPlan的outputPartitioning函数进行定义,对于B和C两个PhysicalRDD而言,其定义是


  // TODO: Move to `DistributedPlan`
  /** Specifies how data is partitioned across different nodes in the cluster. */
  /**
   *
   * @return
   */
  def outputPartitioning: Partitioning = UnknownPartitioning(0) // TODO: WRONG WIDTH!


上面outputPartitioning的定义其实是定义于SparkPlan中,PhysicalRDD默认继承了SparkPlan的实现,即PhysicalRDD的outputPartitioning是分区未知类型(UnknownPartitioning)


B和C(children)的outputPartioning需要和A(B和C的parent)的requiredChildDistribution需要满足satisfied关系,具体代码在ensureDistributionAndOrdering中:


    //获得operator(比如SortMergeJoin)的requiredChildDistribution集合
    val requiredChildDistributions: Seq[Distribution] = operator.requiredChildDistribution

 <pre name="code" class="sql">   //获得operator(比如SortMergeJoin)的requiredChildOrdering集合
 val requiredChildOrderings: Seq[Seq[SortOrder]] = operator.requiredChildOrdering var children: Seq[SparkPlan] = operator.children // Ensure that the operator's children satisfy their output distribution requirements:
//对SortMergeJoin的每个child物理计划,进行检查(child物理计划的outputPartitioning和distribution)
 children = children.zip(requiredChildDistributions).map { case (child, distribution) => val o = child.outputPartitioning val s = o.satisfies(distribution) if (s) { child } else { val p = canonicalPartitioning(distribution) Exchange(p, child) //不满足则插入Exchange } }

 

4. EnsureRequirements的源代码流程:

/**
 * Ensures that the [[org.apache.spark.sql.catalyst.plans.physical.Partitioning Partitioning]]
 * of input data meets the
 * [[org.apache.spark.sql.catalyst.plans.physical.Distribution Distribution]] requirements for
 * each operator by inserting [[Exchange]] Operators where required.  Also ensure that the
 * input partition ordering requirements are met.
 */

private[sql] case class EnsureRequirements(sqlContext: SQLContext) extends Rule[SparkPlan] {
  // TODO: Determine the number of partitions.
  private def numPartitions: Int = sqlContext.conf.numShufflePartitions

  /**
   * Given a required distribution, returns a partitioning that satisfies that distribution.
   */
  private def canonicalPartitioning(requiredDistribution: Distribution): Partitioning = {
    requiredDistribution match {
      case AllTuples => SinglePartition //单一Partition
      case ClusteredDistribution(clustering) => HashPartitioning(clustering, numPartitions)
      case OrderedDistribution(ordering) => RangePartitioning(ordering, numPartitions)
      case dist => sys.error(s"Do not know how to satisfy distribution $dist")
    }
  }

  /**
   * 物理计划转换
   * @param operator
   * @return
   */
  private def ensureDistributionAndOrdering(operator: SparkPlan): SparkPlan = {
    //用于调试目的,只有当operator是SortMergeJoin时才挂起
    if (operator.isInstanceOf[SortMergeJoin]) {
      println
    }

    //当前operator(比如SortMergeJoin)要求子物理计划的数据分区
    val requiredChildDistributions: Seq[Distribution] = operator.requiredChildDistribution

    //当前operator( 比如SortMergeJoin)要求子物理计划的数据排序情况
    //对于SortMergeJoin而言,就是
    val requiredChildOrderings: Seq[Seq[SortOrder]] = operator.requiredChildOrdering

    //当前operator的孩子物理计划,对于SortMergeJoin而言,就是两个PhysicalRDD
    var children: Seq[SparkPlan] = operator.children

    // Ensure that the operator's children satisfy their output distribution requirements:
    //operator的children和operator的requiredChildDistributions两个集合进行zip操作
    //结果是针对每个child和对该child要求的数据分区进行satisfication检查
    children = children.zip(requiredChildDistributions).map {
      case (child, distribution) =>
        //获得该child的数据分区
        val  o = child.outputPartitioning
        //child的数据分区是否满足parent要求的数据分布,不满足则插入Exchange物理计划
        //child如果是PhysicalRDD,那么它的outputPartitioning是UnknownPartitioning,而distribution是ClusteredDistribution
        //s是false
        val s = o.satisfies(distribution)
        if (s) {
          child
        } else {
           //根据distribution获得适当的partitioning,
          //对于ClusteredDistribution,将返回HashPartitioning
          val p = canonicalPartitioning(distribution)
          Exchange(p, child)
        }
    }

    // If the operator has multiple children and specifies child output distributions (e.g. join),
    // then the children's output partitionings must be compatible:

    val a = children.length
    val b = requiredChildDistributions.toSet != Set(UnspecifiedDistribution)
    val ops = children.map(_.outputPartitioning)
    val c =  !Partitioning.allCompatible(ops)
    if (a > 1 && b && c) {
      children = children.zip(requiredChildDistributions).map {
          case (child, distribution) =>
              val targetPartitioning = canonicalPartitioning(distribution)
              val op = child.outputPartitioning
              val d = op.guarantees(targetPartitioning)
              if (d) {
                child
              } else {
                Exchange(targetPartitioning, child)
              }
      }
    }

    // Now that we've performed any necessary shuffles, add sorts to guarantee output orderings:
    //插入排序物理计划
    children = children.zip(requiredChildOrderings).map {
        case (child, requiredOrdering) =>
          if (requiredOrdering.nonEmpty) { //对于SortMergeJoin而言,requiredOrdering不为空
            // If child.outputOrdering is [a, b] and requiredOrdering is [a], we do not need to sort.
            val minSize = Seq(requiredOrdering.size, child.outputOrdering.size).min //对于PhysicalRDD而言,outputOrdering为空集合,长度为0
            //有一个为空或者一个集合不是另一个集合的子集
            if (minSize == 0 || requiredOrdering.take(minSize) != child.outputOrdering.take(minSize)) {
              //此处获得TungstenSort物理计划,如果child是TungstenExchange,那么TungstenExchange是TungstenSort的child
              val sortPlan = sqlContext.planner.BasicOperators.getSortOperator(requiredOrdering, global = false, child)
              sortPlan
            } else {
              child
            }
          } else {
            child
          }
    }

    //Returns a copy of this node with the children replaced.

    val v = operator.withNewChildren(children)
    v
  }


  /**
   *  EnsureRequirements是一个RuleExecutor,因此调用apply方法完成物理计划的转换,剪裁
   *  EnsureRequirements的apply方法调用ensureDistributionAndOrdering方法
   * @param plan
   * @return
   */
  def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
    case operator: SparkPlan => ensureDistributionAndOrdering(operator)
  }
}



















评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值