Spark启动Executor流程

本文主要介绍了NodeManager启动Executor的过程,包括Shell启动脚本,CoarseGrainedExecutorBackend启动RPC EndPoint。还阐述了Task的反序列化和执行,如rdd的迭代器计算、func函数的调用等,最后提及Task Run进入特定方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

NodeManager 启动Executor

Shell启动脚本

NodeManager
default_container_executor.sh
bash -c ‘java ..CoarseGrainedExecutorBackend’ --> 启动Executor 接收task计算任务
bash -c ‘java ..ExecutorLauncher’ --> 这里应该是直接奔着启动 ApplicationMaster 去了

/yarn/nm/usercache/hadoop/appcache/application_1557744110775_5172/container_e06_1557744110775_5172_01_000003/launch_container.sh

exec /bin/bash -c "LD_LIBRARY_PATH="$HADOOP_COMMON_HOME/../../../CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/lib/native:$LD_LIBRARY_PATH" $JAVA_HOME/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms4096m -Xmx4096m '-Xdebug' '-Xnoagent' '-Djava.compiler=NONE' '-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2346' -Djava.io.tmpdir=$PWD/tmp '-Dspark.authenticate.enableSaslEncryption=false' '-Dspark.authenticate=false' '-Dspark.driver.port=39563' '-Dspark.shuffle.service.port=7337' -Dspark.yarn.app.container.log.dir=/yarn/container-logs/application_1557744110775_5172/container_e06_1557744110775_5172_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.59.34.203:39563 --executor-id 2 --hostname host-10-59-34-204 --cores 1 --app-id application_1557744110775_5172 --user-class-path file:$PWD/__app__.jar 1>/yarn/container-logs/application_1557744110775_5172/container_e06_1557744110775_5172_01_000003/stdout 2>/yarn/container-logs/application_1557744110775_5172/container_e06_1557744110775_5172_01_000003/stderr"

CoarseGrainedExecutorBackend 启动 RPC EndPoint

    // CoarseGrainedExecutorBackend
    
    // main()
    // run() 启动executor rpc Endpint
      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, sparkHostPort, cores, userClassPath, env))

  // CoarseGrainedExecutorBackend 通过rpc接收任务管理的调用
  override def receive: PartialFunction[Any, Unit] = {
    case RegisteredExecutor(hostname) =>
      logInfo("Successfully registered with driver")
      try {
        executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
      } catch {
        case NonFatal(e) =>
          exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
      }

    case RegisterExecutorFailed(message) =>
      exitExecutor(1, "Slave registration failed: " + message)

    case LaunchTask(data) =>
      if (executor == null) {
        exitExecutor(1, "Received LaunchTask command but executor was null")
      } else {
        val taskDesc = ser.deserialize[TaskDescription](data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        // 这里根据接收到的参数,实例化一个TaskRunner(Runnable)对象,再使用线程池提交执行
        executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
          taskDesc.name, taskDesc.serializedTask)
      }

    case KillTask(taskId, _, interruptThread) =>
      if (executor == null) {
        exitExecutor(1, "Received KillTask command but executor was null")
      } else {
        executor.killTask(taskId, interruptThread)
      }

    case StopExecutor =>
      stopping.set(true)
      logInfo("Driver commanded a shutdown")
      // Cannot shutdown here because an ack may need to be sent back to the caller. So send
      // a message to self to actually do the shutdown.
      self.send(Shutdown)

    case Shutdown =>
      stopping.set(true)
      executor.stop()
      stop()
      rpcEnv.shutdown()
  }

在 Executor 中,

  • TaskRunner.run(Executor.scala:242)
  • Task.run()
  • Task.runTask(context) 执行的实现类为 ResultTask

Task的反序列化和执行

  override def runTask(context: TaskContext): U = {
    // Deserialize the RDD and the func using the broadcast variables.
    val deserializeStartTime = System.currentTimeMillis()
    val ser = SparkEnv.get.closureSerializer.newInstance()
    // 补充
    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
    _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime

    metrics = Some(context.taskMetrics)
    func(context, rdd.iterator(partition, context))
  }

先看下 taskBinary 的注释说明 : broadcasted version of the serialized RDD and the function to apply on each partition of the given RDD. Once deserialized, the type should be (RDD[T], (TaskContext, Iterator[T]) => U).

  • rdd
    看一下 ser.deserialize() 返回的结果(rdd,func)
  • rdd : 比较好理解,就是要计算集合的抽象
  • rdd.iterator(partition, context) : 返回当前rdd在当前partition上的迭代器,如果依赖的rdd不存在,需要级联 compute() 调用
  • func : 应该是先调用迭代器取值,再依次调用func函数进行计算,具体Func函数内容应该是map,groupByKey 这些函数… // TODO

Task Run

// run:89, Task
    (runTask(context), context.collectAccumulators())

// runTask:66, ResultTask
  override def runTask(context: TaskContext): U = {
    // Deserialize the RDD and the func using the broadcast variables.
    val deserializeStartTime = System.currentTimeMillis()
    val ser = SparkEnv.get.closureSerializer.newInstance()
    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
    _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime

    metrics = Some(context.taskMetrics)
    func(context, rdd.iterator(partition, context))
  }

// iterator:270, RDD  
  final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
    if (storageLevel != StorageLevel.NONE) {
      SparkEnv.get.cacheManager.getOrCompute(this, split, context, storageLevel)
    } else {
      computeOrReadCheckpoint(split, context)
    }
  }
  
  private[spark] def computeOrReadCheckpoint(split: Partition, context: TaskContext): Iterator[T] =
  {
    if (isCheckpointedAndMaterialized) {
      firstParent[T].iterator(split, context)
    } else {
      compute(split, context)
    }
  }

// 和DAG类似,通过最后一个RDD的compute方法处理程序,我们这里最后一个 RDD 是 MapPartitionsRDD,调用进入firstParent的iterator方法,循环和上面的iterator类似
// compute:38, MapPartitionsRDD
  override def compute(split: Partition, context: TaskContext): Iterator[U] =
    f(context, split.index, firstParent[T].iterator(split, context))


// 再次进入compute方法时,执行 f 函数,再进入 RDD的 mapPartitions方法
  /**
   * Return a new RDD by applying a function to each partition of this RDD.
   *
   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
   */
  def mapPartitions[U: ClassTag](
      f: Iterator[T] => Iterator[U],
      preservesPartitioning: Boolean = false): RDD[U] = withScope {
    val cleanedF = sc.clean(f)
    new MapPartitionsRDD(
      this,
      (context: TaskContext, index: Int, iter: Iterator[T]) => cleanedF(iter),
      preservesPartitioning)
  }

cleanedF 函数在迭代器上执行,进入 cleanedF Function (= BroadcastNestedLoopJoin)的 override def doExecute(): RDD[InternalRow] 方法

// TODO 方法进入这里,可能是因为 cleanedF就是 `BroadcastNestedLoopJoin` 在 `doExecute()` 中定义的方法,被传输到executor节点被执行了

streamed.execute().mapPartitions { streamedIter =>
}
cleanedF = {BroadcastNestedLoopJoin$$anonfun$2@7677} "<function1>"
 $outer = {BroadcastNestedLoopJoin@7680} "BroadcastNestedLoopJoin BuildRight, LeftOuter, Some((((open_time_hqb#19 <= day_id#22) || (open_time_dq#20 <= day_id#22)) || (open_time_lhqx#21 <= day_id#22)))\n:- HiveTableScan [customer_id#18,open_time_hqb#19,open_time_dq#20,open_time_lhqx#21], MetastoreRelation i8ji, tmp_dm_cust_daily_aum_open, Some(a)\n+- HiveTableScan [day_id#22], MetastoreRelation i8ji, tmp_dm_cust_daily_aum_pt, Some(t)\n"
 numStreamedRows = {LongSQLMetric@7683} "0"
 broadcastedRelation = {TorrentBroadcast@7679} "Broadcast(4)"

参考文档

### Apache Spark 启动全过程详解 Apache Spark启动过程涉及多个组件之间的交互,主要包括集群管理器的选择、Driver 和 Executor 的初始化以及资源分配等阶段。以下是 Spark 启动过程的技术细节: #### 1. 集群模式选择 Spark 支持多种部署方式,包括 Standalone 模式、YARN 模式、Mesos 模式和 Kubernetes 模式。用户可以在提交应用时通过 `--master` 参数指定集群管理模式。例如,在 YARN 上运行的应用程序可以通过设置 `yarn-client` 或 `yarn-cluster` 来决定 Driver 是否在客户端运行。 当用户提交一个 Spark 应用时,Spark 提交脚本(如 `spark-submit`)会解析配置文件并根据所选的 Master URL 初始化相应的环境[^1]。 #### 2. SparkContext 创建 在应用程序中,开发者通常通过创建 `SparkSession` 或者更底层的 `SparkContext` 对象来启动 Spark 应用。这一阶段的主要工作如下: - **加载配置**:从默认配置文件(如 `conf/spark-defaults.conf`)、命令行参数或者动态设定的属性中加载 Spark 配置。 - **初始化调度器**:创建 TaskScheduler 并注册到 Cluster Manager 中。TaskScheduler 负责任务分发,而 DAGScheduler 则负责将逻辑计划转化为物理执行计划。 - **绑定监听器**:为事件总线绑定各种 Listener,用于监控作业进度和其他元数据更新。 ```scala val conf = new SparkConf().setAppName("MyApp").setMaster("local[*]") val sc = new SparkContext(conf) ``` #### 3. 注册至集群管理器 一旦 SparkContext 成功构建完成之后,它便会尝试联系选定好的 Resource Manager (RM),比如对于 standalone cluster mode 下来说就是 master node;而在 yarn client/server modes 当中则是 ResourceManager daemon 运行所在位置。此时 driver program 将发送 register request 请求告知 RM 自己的存在状态及其所需资源规格信息(cores, memory etc.)以便后续安排 executors 实例化事宜[^4]。 #### 4. Executors 分配与启动 Cluster Manager 接收到 RegisterApplication 消息后,依据当前系统的负载情况以及其他约束条件(像 locality preferences),逐步批准申请并将实际可用 slot 数量反馈给 Client Side(Driver Process)。随后便进入 LaunchExecutor phase —— 即由 Worker Nodes 执行特定 shell script 去实例化 java process 形式的 worker threads(pool size determined by parameter settings like 'spark.executor.cores') ,从而形成完整的 distributed computing framework structure. 值得注意的是,在整个生命周期里,除非显式终止 session/stop context operation 发生之前,所有已分配出去 resources including both cpu time slots alongside corresponding physical memories will be reserved exclusively dedicated solely towards serving this single job only without interruption unless otherwise specified beforehand via advanced tuning parameters such as dynamic allocation mechanism enabled under certain circumstances.[^1] #### 5. 数据处理流程 随着 executor nodes 正常上线运作起来以后,接下来便是围绕 input datasets 展开的一系列 transformation & action operations 定义出来的 pipeline execution chain 。每当遇到某个 stage boundary point where shuffling becomes necessary due to key-value pair redistributions across partitions boundaries , system automatically triggers another round of resource negotiation processes similar described above but specifically targeting those newly generated intermediate results files stored temporarily within local filesystem directories managed separately per individual tasks involved during previous stages executions . 最后值得一提提一下 shuffle read/write mechanisms implemented inside block manager components which play crucial roles ensuring efficient data transfers between different machines located geographically far away from each other yet still maintaining high performance levels thanks largely contributed efforts made possible through sophisticated algorithms designed around network protocols selection strategies mentioned earlier regarding netty vs nio implementations choices affecting overall latencies experienced throughout entire end-to-end pipelines constructed based upon user supplied business logics encoded into their custom written map/reduce functions bodies passed along side regular api calls sequences invoked sequentially following standard library conventions established over years development iterations cycles continuously improving codebases maintained actively open source communities worldwide today ![^3] --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值