Spark2.x（五十九）：yarn-cluster模式提交Spark任务，如何关闭client进程?

最新推荐文章于 2024-01-03 11:26:39 发布

weixin_30799995

最新推荐文章于 2024-01-03 11:26:39 发布

阅读量288

点赞数

CC 4.0 BY-SA版权

文章标签：大数据

原文链接：http://www.cnblogs.com/yy3b2007com/p/11302886.html

本文分析了在采用YARN-Cluster模式提交Spark应用程序时，提交节点上残留的YARN client进程导致资源耗尽的问题。详细探讨了YARN提交Spark任务的两种模式，yarn-client与yarn-cluster的区别，并提供了通过配置spark.yarn.submit.waitAppCompletion参数来避免client进程长期占用资源的解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题：

最近现场反馈采用yarn-cluster方式提交spark application后，在提交节点机上依然会存在一个yarn的client进程不关闭，又由于spark application都是spark structured streaming程序（application常年累月的执行），最终导致spark application提交节点服务器资源被占满，当执行其他操作时，会出现以下错误：

[dx@my-linux-01 bin]$ yarn logs -applicationId application_15644802175503_0189
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c000000, 702021632, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 702021632 bytes to committing reserved memory.
# An error report file with more information is saved as:
# /home/dx/myProj/appApp/bin/hs_err_pid53561.log
[dx@my-linux-01 bin]$

现场对spark application提交节点进行分析发现占用进程主要是（yarn client集成占用）：

[dx@my-linux-01 bin]$ top
PID     USER  PR  NI    VIRT     RES  SHR   S  %CPU   %MEM   TIME+    COMMAND
122236  dx    20  0  20.629g  1.347g  3520  S   0.3    2.1   7:02.42     java
122246  dx    20  0  20.629g  1.311g  3520  S   0.3    2.0   7:03.42     java
122236  dx    20  0  20.629g  1.288g  3520  S   0.3    2.2   7:05.83     java
122346  dx    20  0  20.629g  1.344g  3520  S   0.3    2.1   7:10.42     java
121246  dx    20  0  20.629g  1.343g  3520  S   0.3    2.3   7:01.42     java
122346  dx    20  0  20.629g  1.341g  3520  S   0.3    2.4   7:03.39     java
112246  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.42     java
............
112260  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.02     java
112260  dx    20  0  113116      200     0  S   0.0    0.0   0:00.00     sh
............

Yarn提交Spark任务分析：

yarn方式提交spark application包含两种：

1）yarn-client（spark-submit --master yarn --deploy-mode client ...）：

这种方式spark提交application任务之后，driver运行在提交服务器节点，且driver运行yarn的client进程中，因此如果关闭了提交服务器节点上client进程会导致driver被关闭，进而导致application被关闭。

2）yarn-cluster（spark-submit --master yarn --deploy-mode cluster）：

这种方式spark提交application任务之后，driver运行yarn分配container内，container内分配一个AM(Application Master)进程，SparkContext(driver)运行在该AM内，在yarn提交时，在提交节点上也会启动一个yarn的client进程，默认yarn-client方式提交完application后会等待任务结束（failed,finished等），否则会一直运行。

解决方案：

yarn.client的参数

spark.yarn.submit.waitAppCompletion

如果设置这个参数为true 的话，client将会一直运行并且报告application的状态直到application退出（无论何种原因）；

如果设置这个参数为false的话，client的进程将会在application提交后退出。

在spark-submit 参数添加参数

./bin/spark-submit.sh \
--master yarn \
--deploy-mode cluster \
--conf spark.yarn.submit.waitAppCompletion=false
....

对应yarn.client类中代码位置：

  /**
   * Submit an application to the ResourceManager.
   * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive
   * reporting the application's status until the application has exited for any reason.
   * Otherwise, the client process will exit after submission.
   * If the application finishes with a failed, killed, or undefined status,
   * throw an appropriate SparkException.
   */
  def run(): Unit = {
    this.appId = submitApplication()
    if (!launcherBackend.isConnected() && fireAndForget) {
      val report = getApplicationReport(appId)
      val state = report.getYarnApplicationState
      logInfo(s"Application report for $appId (state: $state)")
      logInfo(formatReportDetails(report))
      if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
        throw new SparkException(s"Application $appId finished with status: $state")
      }
    } else {
      val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId)
      if (yarnApplicationState == YarnApplicationState.FAILED ||
        finalApplicationStatus == FinalApplicationStatus.FAILED) {
        throw new SparkException(s"Application $appId finished with failed status")
      }
      if (yarnApplicationState == YarnApplicationState.KILLED ||
        finalApplicationStatus == FinalApplicationStatus.KILLED) {
        throw new SparkException(s"Application $appId is killed")
      }
      if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {
        throw new SparkException(s"The final status of application $appId is undefined")
      }
    }
  }