Spark任务提交流程

本文详细解析了Spark在YARN集群模式下,使用spark-submit提交任务的具体流程,包括从客户端提交任务到ResourceManager,再到ApplicationMaster的启动、Driver线程的运行,以及资源申请、Executor的启动和任务执行的全过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Spark的任务, 生产环境中一般提交到Yarn上执行. 具体流程如下图所示

1、client提交任务到RM.

2、RM启动AM.

3、AM启动Driver线程, 并向RM申请资源.

4、RM返回可用资源列表.

5、AM通过nmClient启动Container, 并且启动CoraseGrainedExecutorBackend后台进程.

6、Executor反向注册给Driver

7、Executor启动任务

具体用法

spark-submit.sh内部是执行了org.apache.spark.deploy.SparkSubmit这个类.(不再赘述, 感兴趣的同学可以vim看下)

我们在idea中找到这个类, 并定位main函数, 得到以下代码.

  override def main(args: Array[String]): Unit = {
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
    }
  }

appArgs.action, 初始化的时候有赋值, 

// Action should be SUBMIT unless otherwise specified
action = Option(action).getOrElse(SUBMIT)

我们直接点击submit(appArgs, uninitLog), 跳转到对应的方法.

private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)

    def doRunMain(): Unit = {
      if (args.proxyUser != null) {
        val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
          UserGroupInformation.getCurrentUser())
        try {
          proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
            override def run(): Unit = {
              runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
            }
          })
        } catch {
          case e: Exception =>
            // Hadoop's AuthorizationException suppresses the exception's stack trace, which
            // makes the message printed to the output by the JVM not very helpful. Instead,
            // detect exceptions with empty stack traces here, and treat them differently.
            if (e.getStackTrace().length == 0) {
              // scalastyle:off println
              printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
              // scalastyle:on println
              exitFn(1)
            } else {
              throw e
            }
        }
      } else {
        runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
      }
    }

    // Let the main class re-initialize the logging system once it starts.
    if (uninitLog) {
      Logging.uninitialize()
    }

    // In standalone cluster mode, there are two submission gateways:
    //   (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper
    //   (2) The new REST-based gateway introduced in Spark 1.3
    // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
    // to use the legacy gateway if the master e
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值