通过学习Spark源码为了更深入的了解Spark。主要按照以下流程进行Spark的源码分析,包含了Spark集群的启动以及任务提交的执行流程:
- Spark RPC分析
- start-all.sh
- Master启动分析
- Work启动分析
- spark-submit.sh脚本分析
- SparkSubmit分析
- SparkContext初始化
5.spark-submit.sh脚本分析
通过spark-submit.sh提交任务,查看详细信息,如下:
set CLASS=org.apache.spark.deploy.SparkSubmit
"%~dp0spark-class2.cmd" %CLASS% %*
6.SparkSubmit分析
这里看到是通过SparkSubmit这个类进行任务的提交,找到这个类的main方法,如下:
override def main(args: Array[String]): Unit = {
val submit = new SparkSubmit() {
self =>
override protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
new SparkSubmitArguments(args) {
override protected def logInfo(msg: => String): Unit = self.logInfo(msg)
override protected def logWarning(msg: => String): Unit = self.logWarning(msg)
}
}
override protected def logInfo(msg: => String): Unit = printMessage(msg)
override protected def logWarning(msg: => String): Unit = printMessage(s"Warning: $msg")
/**
* 注释:
*/
override def doSubmit(args: Array[String]): Unit = {
try {
/**
* 注释:
*/
super.doSubmit(args)
} catch {
case e: SparkUserAppException => exitFn(e.exitCode)
}
}
}
/**
* 注释: 通过 SparkSubmit 的 doSubmit 提交
*/
submit.doSubmit(args)
}
这里是通过super.doSubmit方法提交。
def doSubmit(args: Array[String]): Unit = {
// Initialize logging if it hasn't been done yet. Keep track of whether logging needs to
// be reset before the application starts.
val uninitLog = initializeLogIfNecessary(true, silent = true)
val appArgs = parseArguments(args)
if (appArgs.verbose) {
logInfo(appArgs.toString)
}
/**
* 注释:
* 根据你的 spark-submit 的命令来决定到底执行那个方法
* $CLASS 参数
*/
appArg

本文详细解析了Spark源码,涉及启动流程、SparkSubmit脚本、SparkContext初始化,重点剖析了TaskScheduler、DAGScheduler和SparkSubmit的工作原理,包括Master、Worker通信及SparkContext创建关键步骤。
最低0.47元/天 解锁文章
433





