Spark的任务, 生产环境中一般提交到Yarn上执行. 具体流程如下图所示
1、client提交任务到RM.
2、RM启动AM.
3、AM启动Driver线程, 并向RM申请资源.
4、RM返回可用资源列表.
5、AM通过nmClient启动Container, 并且启动CoraseGrainedExecutorBackend后台进程.
6、Executor反向注册给Driver
7、Executor启动任务
具体用法
spark-submit.sh内部是执行了org.apache.spark.deploy.SparkSubmit这个类.(不再赘述, 感兴趣的同学可以vim看下)
我们在idea中找到这个类, 并定位main函数, 得到以下代码.
override def main(args: Array[String]): Unit = {
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
}
}
appArgs.action, 初始化的时候有赋值,
// Action should be SUBMIT unless otherwise specified
action = Option(action).getOrElse(SUBMIT)
我们直接点击submit(appArgs, uninitLog), 跳转到对应的方法.
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
def doRunMain(): Unit = {
if (args.proxyUser != null) {
val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
UserGroupInformation.getCurrentUser())
try {
proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
override def run(): Unit = {
runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
}
})
} catch {
case e: Exception =>
// Hadoop's AuthorizationException suppresses the exception's stack trace, which
// makes the message printed to the output by the JVM not very helpful. Instead,
// detect exceptions with empty stack traces here, and treat them differently.
if (e.getStackTrace().length == 0) {
// scalastyle:off println
printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
// scalastyle:on println
exitFn(1)
} else {
throw e
}
}
} else {
runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
}
}
// Let the main class re-initialize the logging system once it starts.
if (uninitLog) {
Logging.uninitialize()
}
// In standalone cluster mode, there are two submission gateways:
// (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper
// (2) The new REST-based gateway introduced in Spark 1.3
// The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
// to use the legacy gateway if the master e