参考:图解 Spark:核心技术与案例实战
在 DAGScheduler 的 handleMapStageSubmitted 方法中,划分完调度阶段后,进行调度阶段的提交。
调度阶段的提交是在 submitStage 方法中进行的:
/** Submits stage, but first recursively submits any missing parents. */
private def submitStage(stage: Stage) {
val jobId = activeJobForStage(stage)
if (jobId.isDefined) {
logDebug("submitStage(" + stage + ")")
if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
val missing = getMissingParentStages(stage).sortBy(_.id)
logDebug("missing: " + missing)
if (missing.isEmpty) {
logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
submitMissingTasks(stage, jobId.get)
} else {
for (parent <- missing) {
submitStage(parent)
}
waitingStages += stage
}
}
} else {
abortStage(stage, "No active job for stage " + stage.id, None)
}
}
该方法中,先通过 getMissingParentStages 获取 finalStage 的父调度阶段:
private def getMissingParentStages(stage: Stage): List[Stage] = {
val missing = new HashSet[Stage]
val visited = new HashSet[RDD[_]]
// We are manually maintaining a stack here to prevent StackOverflowError
// caused by recursively visiting
val waitingForVisit = new Stack[RDD[_]]
def visit(rdd: RDD[_]) {
if (!visited(rdd)) {
visited += rdd
val rddHasUncachedPartitions = getCacheLocs(rdd).contains(Nil)
if (rddHasUncachedPartitions) {
for (dep <- rdd.dependencies) {
dep match {
case shufDep: ShuffleDependency[_, _, _] =>
val mapStage = getOrCreateShuffleMapStage(shufDep, stage.firstJobId)
if (!mapStage.isAvailable) {
missing += mapStage
}
case narrowDep: NarrowDependency[_] =>
waitingForVisit.push(narrowDep.rdd)
}
}
}
}
}
waitingForVisit.push(stage.rdd)
while (waitingForVisit.nonEmpty) {
visit(waitingForVisit.pop())
}
missing.toList
}
在获取父调度阶段时,是通过该 rdd 的依赖关系向前遍历,判断是否存在 Shuffle 操作,如果存在 shuffle 操作,则会创建 ShuffleMapStage,但因为划分调度阶段的时候 ShuffleMapStage 已经创建并缓存了,所以这里会直接获取。
如果存在父调度阶段,将其加入 waitingStages 等待集合,并对递归调用 submitStage 操作,直到找到开始的调度阶段。
如果不存在父调度阶段,则作为作业运行的入口,进入 submitMissingTasks 操作,当入口调度阶段完成后,相继提交后续调度阶段,在提交前先判断依赖的父调度阶段结果是否可用,如果可用,则提交调度阶段,如果不可应,则提交结果不可用的父调度阶段。
这期间会经过很多消息传递,在 Executor.run 任务执行完成后会发送消息:
override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
val msg = StatusUpdate(executorId, taskId, state, data)
driver match {
case Some(driverRef) => driverRef.send(msg)
case None => logWarning(s"Drop $msg because has not yet connected to driver")
}
}
DriverEndpoint 收到消息后,调用 TaskSchedulerImpl 的 statusUpdate 方法: