菜鸟的Spark 源码学习之路 -3 TaskScheduler源码 - part2

本文深入探讨了TaskScheduler的工作原理,包括资源分配、任务启动流程及优先级处理机制。通过分析sourceOffers()方法,展示了如何平衡任务在集群中的分布,并详细解释了resourceOfferSingleTaskSet()方法中任务启动的具体实现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

上一篇文章里面 讲了TaskScheduler的主要初始化过程和任务提交 https://blog.youkuaiyun.com/u012543819/article/details/81484416

这次我们将继续深入TaskScheduler源码,看它到底还有什么其他操作,如何启动任务。

1. resourceOffers()方法

源码:

/**
  * Called by cluster manager to offer resources on slaves. We respond by asking our active task
  * sets for tasks in order of priority. We fill each node with tasks in a round-robin manner so
  * that tasks are balanced across the cluster.
     cluster manager 为slave提供resources的时候会调用此方法。它从active的 task set中按task优先级响应。 为保证任务在集  群上均衡分布,使用轮询(round-robin)的方式给节点分配任务。
  */
参数的数据结构是这样的:

case class WorkerOffer(executorId: String, host: String, cores: Int)

它记录了executor,host和 core的数量

def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {
  // Mark each slave as alive and remember its hostname
  // Also track if new executor is added
  var newExecAvail = false
  //遍历所有的host,如果现存的TaskScheduler 中不包含对应的host,executor信息,记录这些信息
  for (o <- offers) {
    if (!hostToExecutors.contains(o.host)) {
      hostToExecutors(o.host) = new HashSet[String]()
    }
    if (!executorIdToRunningTaskIds.contains(o.executorId)) {
      hostToExecutors(o.host) += o.executorId
      executorAdded(o.executorId, o.host)
      executorIdToHost(o.executorId) = o.host
      executorIdToRunningTaskIds(o.executorId) = HashSet[Long]()
      newExecAvail = true
    }
    //添加没有被记录的rack信息
    for (rack <- getRackForHost(o.host)) {
      hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += o.host
    }
  }

  // Before making any offers, remove any nodes from the blacklist whose blacklist has expired. Do
  // this here to avoid a separate thread and added synchronization overhead, and also because
  // updating the blacklist is only relevant when task offers are being made.
  // 将过期黑名单中的节点从黑名单移除
  blacklistTrackerOpt.foreach(_.applyBlacklistTimeout())

  val filteredOffers = blacklistTrackerOpt.map { blacklistTracker =>
    offers.filter { offer =>
      !blacklistTracker.isNodeBlacklisted(offer.host) &&
        !blacklistTracker.isExecutorBlacklisted(offer.executorId)
    }
  }.getOrElse(offers)
  // shuffle 一下,避免总是分配任务到几个相同的节点
  val shuffledOffers = shuffleOffers(filteredOffers)
  // Build a list of tasks to assign to each worker.
  // 构建一个任务列表并分配给每个worker
  val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
  val availableCpus = shuffledOffers.map(o => o.cores).toArray
  // 按照pool 中的任务模式(FIFO,Fair)的优先级排序一下
  val sortedTaskSets = rootPool.getSortedTaskSetQueue
  for (taskSet <- sortedTaskSets) {
    logDebug("parentName: %s, name: %s, runningTasks: %s".format(
      taskSet.parent.name, taskSet.name, taskSet.runningTasks))
    if (newExecAvail) {
      taskSet.executorAdded()
    }
  }

  // Take each TaskSet in our scheduling order, and then offer it each node in increasing order
  // of locality levels so that it gets a chance to launch local tasks on all of them.
  // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY
  // 遍历taskSet, 按优 locality(本地性) 先级从高到低尝试启动一个任务
  for (taskSet <- sortedTaskSets) {
    var launchedAnyTask = false
    var launchedTaskAtCurrentMaxLocality = false
    for (currentMaxLocality <- taskSet.myLocalityLevels) {
      do {
        // resourceOfferSingleTaskSet 这里实际启动一个task
        launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
          taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks)
        launchedAnyTask |= launchedTaskAtCurrentMaxLocality
      } while (launchedTaskAtCurrentMaxLocality)
    }
    // 如果在黑名单节点上启动,则直接放弃执行
    if (!launchedAnyTask) {
      taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
    }
  }

  if (tasks.size > 0) {
    hasLaunchedTask = true
  }
  return tasks
}

调用树:

CoarseGrainedSchedulerBackend 里是这样调用该方法的:

这里主要是将worker上的executor资源资源返回给TaskScheduler;

2. resourceOfferSingleTaskSet

上一部分看到,resourceOffers 方法在中调用了resourceOfferSingleTaskSet方法来启动一个task,接下来我们看看这个方法的实现:

private def resourceOfferSingleTaskSet(
                                        taskSet: TaskSetManager,
                                        maxLocality: TaskLocality,
                                        shuffledOffers: Seq[WorkerOffer],
                                        availableCpus: Array[Int],
                                        tasks: IndexedSeq[ArrayBuffer[TaskDescription]]): Boolean = {
  var launchedTask = false
  // nodes and executors that are blacklisted for the entire application have already been
  // filtered out by this point
  // 这里已经过滤了黑名单上的节点和 executors 
  for (i <- 0 until shuffledOffers.size) {
    val execId = shuffledOffers(i).executorId
    val host = shuffledOffers(i).host
    if (availableCpus(i) >= CPUS_PER_TASK) {
      try {
        // 记录每个task的相关信息,分配CPU并修改相应的信息,实际调用resourceOffer方法
        for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
          tasks(i) += task
          val tid = task.taskId
          taskIdToTaskSetManager(tid) = taskSet
          taskIdToExecutorId(tid) = execId
          executorIdToRunningTaskIds(execId).add(tid)
          availableCpus(i) -= CPUS_PER_TASK
          assert(availableCpus(i) >= 0)
          launchedTask = true
        }
      } catch {
        case e: TaskNotSerializableException =>
          logError(s"Resource offer failed, task set ${taskSet.name} was not serializable")
          // Do not offer resources for this task, but don't throw an error to allow other
          // task sets to be submitted.
          return launchedTask
      }
    }
  }
  return launchedTask
}

我们再看一下resourceOffer方法的具体实现

/**
 * Respond to an offer of a single executor from the scheduler by finding a task
 *
 * NOTE: this function is either called with a maxLocality which
 * would be adjusted by delay scheduling algorithm or it will be with a special
 * NO_PREF locality which will be not modified
 *
 * @param execId the executor Id of the offered resource
 * @param host  the host Id of the offered resource
 * @param maxLocality the maximum locality we want to schedule the tasks at
 */
@throws[TaskNotSerializableException]
def resourceOffer(
    execId: String,
    host: String,
    maxLocality: TaskLocality.TaskLocality)
  : Option[TaskDescription] =
{
  //看下host是否在黑名单中
  val offerBlacklisted = taskSetBlacklistHelperOpt.exists { blacklist =>
    blacklist.isNodeBlacklistedForTaskSet(host) ||
      blacklist.isExecutorBlacklistedForTaskSet(execId)
  }
  if (!isZombie && !offerBlacklisted) {
    val curTime = clock.getTimeMillis()

    var allowedLocality = maxLocality

    if (maxLocality != TaskLocality.NO_PREF) {
      allowedLocality = getAllowedLocalityLevel(curTime)
      if (allowedLocality > maxLocality) {
        // We're not allowed to search for farther-away tasks
        allowedLocality = maxLocality
      }
    }
    // 这里开始设置任务信息,最后返回一个task的描述数据
    dequeueTask(execId, host, allowedLocality).map { case ((index, taskLocality, speculative)) =>
      // Found a task; do some bookkeeping and return a task description
      val task = tasks(index)
      val taskId = sched.newTaskId()
      // Do various bookkeeping
      copiesRunning(index) += 1
      val attemptNum = taskAttempts(index).size
      val info = new TaskInfo(taskId, index, attemptNum, curTime,
        execId, host, taskLocality, speculative)
      taskInfos(taskId) = info
      taskAttempts(index) = info :: taskAttempts(index)
      // Update our locality level for delay scheduling
      // NO_PREF will not affect the variables related to delay scheduling
      if (maxLocality != TaskLocality.NO_PREF) {
        currentLocalityIndex = getLocalityIndex(taskLocality)
        lastLaunchTime = curTime
      }
      // Serialize and return the task
      val serializedTask: ByteBuffer = try {
        ser.serialize(task)
      } catch {
        // If the task cannot be serialized, then there's no point to re-attempt the task,
        // as it will always fail. So just abort the whole task-set.
        case NonFatal(e) =>
          val msg = s"Failed to serialize task $taskId, not attempting to retry it."
          logError(msg, e)
          abort(s"$msg Exception during serialization: $e")
          throw new TaskNotSerializableException(e)
      }
      if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KB * 1024 &&
        !emittedTaskSizeWarning) {
        emittedTaskSizeWarning = true
        logWarning(s"Stage ${task.stageId} contains a task of very large size " +
          s"(${serializedTask.limit() / 1024} KB). The maximum recommended task size is " +
          s"${TaskSetManager.TASK_SIZE_TO_WARN_KB} KB.")
      }
      // 添加task到tasksetManager
      addRunningTask(taskId)

      // We used to log the time it takes to serialize the task, but task size is already
      // a good proxy to task serialization time.
      // val timeTaken = clock.getTime() - startTime
      val taskName = s"task ${info.id} in stage ${taskSet.id}"
      logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
        s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")
      // 添加task到dagScheduler
      sched.dagScheduler.taskStarted(task, info)
     // 返回任务描述信息
      new TaskDescription(
        taskId,
        attemptNum,
        execId,
        taskName,
        index,
        addedFiles,
        addedJars,
        task.localProperties,
        serializedTask)
    }
  } else {
    None
  }
}

在taskscheduler中,有这个数据结构记录task 与 taskSetManager的关系

// CoarseGrainedSchedulerBackend
// Make fake resource offers on just one executor
private def makeOffers(executorId: String) {
  // Make sure no executor is killed while some task is launching on it
  val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized {
    // Filter out executors under killing
    if (executorIsAlive(executorId)) {
      val executorData = executorDataMap(executorId)
      val workOffers = IndexedSeq(
        new WorkerOffer(executorId, executorData.executorHost, executorData.freeCores))
      scheduler.resourceOffers(workOffers)
    } else {
      Seq.empty
    }
  }
 // 实际启动任务是回到了CoarseGrainedSchedulerBackend 中
  if (!taskDescs.isEmpty) {
    launchTasks(taskDescs)
  }
}

 

// Launch tasks returned by a set of resource offers
// 按照返回的resource offer 信息启动任务
private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
  for (task <- tasks.flatten) {
    val serializedTask = TaskDescription.encode(task)
    if (serializedTask.limit() >= maxRpcMessageSize) {
      scheduler.taskIdToTaskSetManager.get(task.taskId).foreach { taskSetMgr =>
        try {
          var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
            "spark.rpc.message.maxSize (%d bytes). Consider increasing " +
            "spark.rpc.message.maxSize or using broadcast variables for large values."
          msg = msg.format(task.taskId, task.index, serializedTask.limit(), maxRpcMessageSize)
          taskSetMgr.abort(msg)
        } catch {
          case e: Exception => logError("Exception in error callback", e)
        }
      }
    }
    else {
      val executorData = executorDataMap(task.executorId)
      executorData.freeCores -= scheduler.CPUS_PER_TASK

      logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " +
        s"${executorData.executorHost}.")
      // 实际执行任务是在executor里面
      executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
    }
  }
}

最后我们来梳理一下任务的启动过程:

dagscheduler调用 taskScheduler的resourceOffers()方法给slave提供资源,该方法调用私有的resourceOfferSingleTaskSet()来启动任务,在resourceOfferSingleTaskSet()中又调用了TaskSetManagerresourceOffer()来返回任务的信息,最后回到CoarseGrainedSchedulerBackend.makeOffers,是实际的任务启动是由CoarseGrainedSchedulerBackend 的launchTasks来完成的, 最终调用的是executor的方法启动任务。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值