Spark源码分析之五:Task调度(一)

        在前四篇博文中,我们分析了Job提交运行总流程的第一阶段Stage划分与提交,它又被细化为三个分阶段:

        1、Job的调度模型与运行反馈;

        2、Stage划分;

        3、Stage提交:对应TaskSet的生成。

        Stage划分与提交阶段主要是由DAGScheduler完成的,而DAGScheduler负责Job的逻辑调度,主要职责也即DAG图的分解,按照RDD间是否为shuffle dependency,将整个Job划分为一个个stage,并将每个stage转化为tasks的集合--TaskSet。

        接下来我们要讲的第二阶段Task调度与执行,则是Spark中Job的物理调度,它实际上分为两个主要阶段:

        1、Task调度;

        2、Task运行。

        下面,我们分析下Task的调度。我们知道,在第一阶段的末尾,stage被提交后,每个stage被转化为一组task的集合--TaskSet,而紧接着,则调用taskScheduler.submitTasks()提交这些tasks,而TaskScheduler的主要职责,则是负责Job物理调度阶段--Task调度。TaskScheduler为scala中的一个trait,你可以简单的把它理解为Java中的接口,目前它仅仅有一个实现类TaskSchedulerImpl。

        TaskScheduler负责低层次任务的调度,每个TaskScheduler为一个特定的SparkContext调度tasks。这些调度器获取到由DAGScheduler为每个stage提交至他们的一组Tasks,并负责将这些tasks发送到集群,以执行它们,在它们失败时重试,并减轻掉队情况(类似MapReduce的推测执行原理吧,在这里留个疑问)。这些调度器返回一些事件events给DAGScheduler。其源码如下:

/**
 * Low-level task scheduler interface, currently implemented exclusively by
 * [[org.apache.spark.scheduler.TaskSchedulerImpl]].
 * This interface allows plugging in different task schedulers. Each TaskScheduler schedules tasks
 * for a single SparkContext. These schedulers get sets of tasks submitted to them from the
 * DAGScheduler for each stage, and are responsible for sending the tasks to the cluster, running
 * them, retrying if there are failures, and mitigating stragglers. They return events to the
 * DAGScheduler.
 * 
 */
private[spark] trait TaskScheduler {

  private val appId = "spark-application-" + System.currentTimeMillis

  def rootPool: Pool

  def schedulingMode: SchedulingMode

  def start(): Unit

  // Invoked after system has successfully initialized (typically in spark context).
  // Yarn uses this to bootstrap allocation of resources based on preferred locations,
  // wait for slave registrations, etc.
  def postStartHook() { }

  // Disconnect from the cluster.
  def stop(): Unit

  // Submit a sequence of tasks to run.
  def submitTasks(taskSet: TaskSet): Unit

  // Cancel a stage.
  def cancelTasks(stageId: Int, interruptThread: Boolean)

  // Set the DAG scheduler for upcalls. This is guaranteed to be set before submitTasks is called.
  def setDAGScheduler(dagScheduler: DAGScheduler): Unit

  // Get the default level of parallelism to use in the cluster, as a hint for sizing jobs.
  def defaultParallelism(): Int

  /**
   * Update metrics for in-progress tasks and let the master know that the BlockManager is still
   * alive. Return true if the driver knows about the given block manager. Otherwise, return false,
   * indicating that the block manager should re-register.
   */
  def executorHeartbeatReceived(execId: String, taskMetrics: Array[(Long, TaskMetrics)],
    blockManagerId: BlockManagerId): Boolean

  /**
   * Get an application ID associated with the job.
   *
   * @return An application ID
   */
  def applicationId(): String = appId

  /**
   * Process a lost executor
   */
  def executorLost(executorId: String, reason: ExecutorLossReason): Unit

  /**
   * Get an application's attempt ID associated with the job.
   *
   * @return An application's Attempt ID
   */
  def applicationAttemptId(): Option[String]

}
        通过源码我们可以知道,TaskScheduler提供了实例化与销毁时必要的start()和stop()方法,并提供了提交Tasks与取消Tasks的submitTasks()和cancelTasks()方法,并且通过executorHeartbeatReceived()周期性的接收executor的心跳,更新运行中tasks的元信息,并让master知晓BlockManager仍然存活。

        好了,结合源码,我们一步步来看吧。

        首先,在DAGScheduler的submitMissingTasks()方法的最后,每个stage生成一组tasks后,即调用用TaskScheduler的submitTasks()方法提交task,代码如下:

// 利用taskScheduler.submitTasks()提交task
      taskScheduler.submitTasks(new TaskSet(
        tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties))
      // 记录提交时间
      stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
        那么我们先来看下TaskScheduler的submitTasks()方法,在其实现类TaskSchedulerImpl中,代码如下:

override def submitTasks(taskSet: TaskSet) {
    
    // 获取TaskSet中的tasks
    val tasks = taskSet.tasks
    logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
    
    // 使用synchronized进行同步
    this.synchronized {
      
      // 创建TaskSetManager
      val manager = createTaskSetManager(taskSet, maxTaskFailures)
      
      // 获取taskSet对应的stageId
      val stage = taskSet.stageId
      
      // taskSetsByStageIdAndAttempt存储的是stageId->[taskSet.stageAttemptId->TaskSetManager]
      // 更新taskSetsByStageIdAndAttempt,将上述对应关系存入
      val stageTaskSets =
        taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
      stageTaskSets(taskSet.stageAttemptId) = manager
      
      // 查看是否存在冲突的taskSet,如果存在,抛出IllegalStateException异常
      val conflictingTaskSet = stageTaskSets.exists { case (_, ts) =>
        ts.taskSet != taskSet && !ts.isZombie
      }
      if (conflictingTaskSet) {
        throw new IllegalStateException(s"more than one active taskSet for stage $stage:" +
          s" ${stageTaskSets.toSeq.map{_._2.taskSet.id}.mkString(",")}")
      }
      
      // 将TaskSetManager添加到schedulableBuilder中
      schedulableBuilder.addTaskSetManager(manager, manager.taskSet.
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值