Spark2.x(五十九):yarn-cluster模式提交Spark任务,如何关闭client进程?

本文分析了在采用YARN-Cluster模式提交Spark应用程序时,提交节点上残留的YARN client进程导致资源耗尽的问题。详细探讨了YARN提交Spark任务的两种模式,yarn-client与yarn-cluster的区别,并提供了通过配置spark.yarn.submit.waitAppCompletion参数来避免client进程长期占用资源的解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题:

最近现场反馈采用yarn-cluster方式提交spark application后,在提交节点机上依然会存在一个yarn的client进程不关闭,又由于spark application都是spark structured streaming程序(application常年累月的执行),最终导致spark application提交节点服务器资源被占满,当执行其他操作时,会出现以下错误:

[dx@my-linux-01 bin]$ yarn logs -applicationId application_15644802175503_0189
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c000000, 702021632, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 702021632 bytes to committing reserved memory.
# An error report file with more information is saved as:
# /home/dx/myProj/appApp/bin/hs_err_pid53561.log
[dx@my-linux-01 bin]$ 

现场对spark application提交节点进行分析发现占用进程主要是(yarn client集成占用):

[dx@my-linux-01 bin]$ top
PID     USER  PR  NI    VIRT     RES  SHR   S  %CPU   %MEM   TIME+    COMMAND
122236  dx    20  0  20.629g  1.347g  3520  S   0.3    2.1   7:02.42     java
122246  dx    20  0  20.629g  1.311g  3520  S   0.3    2.0   7:03.42     java
122236  dx    20  0  20.629g  1.288g  3520  S   0.3    2.2   7:05.83     java
122346  dx    20  0  20.629g  1.344g  3520  S   0.3    2.1   7:10.42     java
121246  dx    20  0  20.629g  1.343g  3520  S   0.3    2.3   7:01.42     java
122346  dx    20  0  20.629g  1.341g  3520  S   0.3    2.4   7:03.39     java
112246  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.42     java
............
112260  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.02     java
112260  dx    20  0  113116      200     0  S   0.0    0.0   0:00.00     sh
............

Yarn提交Spark任务分析:

yarn方式提交spark application包含两种:

1)yarn-client(spark-submit --master yarn --deploy-mode client ...):

这种方式spark提交application任务之后,driver运行在提交服务器节点,且driver运行yarn的client进程中,因此如果关闭了提交服务器节点上client进程会导致driver被关闭,进而导致application被关闭。

2)yarn-cluster(spark-submit --master yarn --deploy-mode cluster):

这种方式spark提交application任务之后,driver运行yarn分配container内,container内分配一个AM(Application Master)进程,SparkContext(driver)运行在该AM内,在yarn提交时,在提交节点上也会启动一个yarn的client进程,默认yarn-client方式提交完application后会等待任务结束(failed,finished等),否则会一直运行。

解决方案:

yarn.client的参数

spark.yarn.submit.waitAppCompletion

如果设置这个参数为true 的话,client将会一直运行并且报告application的状态直到application退出(无论何种原因);

如果设置这个参数为false的话,client的进程将会在application提交后退出。

在spark-submit 参数添加参数

./bin/spark-submit.sh \
--master yarn \
--deploy-mode cluster \
--conf spark.yarn.submit.waitAppCompletion=false
....

对应yarn.client类中代码位置:

  /**
   * Submit an application to the ResourceManager.
   * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive
   * reporting the application's status until the application has exited for any reason.
   * Otherwise, the client process will exit after submission.
   * If the application finishes with a failed, killed, or undefined status,
   * throw an appropriate SparkException.
   */
  def run(): Unit = {
    this.appId = submitApplication()
    if (!launcherBackend.isConnected() && fireAndForget) {
      val report = getApplicationReport(appId)
      val state = report.getYarnApplicationState
      logInfo(s"Application report for $appId (state: $state)")
      logInfo(formatReportDetails(report))
      if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
        throw new SparkException(s"Application $appId finished with status: $state")
      }
    } else {
      val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId)
      if (yarnApplicationState == YarnApplicationState.FAILED ||
        finalApplicationStatus == FinalApplicationStatus.FAILED) {
        throw new SparkException(s"Application $appId finished with failed status")
      }
      if (yarnApplicationState == YarnApplicationState.KILLED ||
        finalApplicationStatus == FinalApplicationStatus.KILLED) {
        throw new SparkException(s"Application $appId is killed")
      }
      if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {
        throw new SparkException(s"The final status of application $appId is undefined")
      }
    }
  }

 

转载于:https://www.cnblogs.com/yy3b2007com/p/11302886.html

import findspark findspark.init() app_name = "zdf-jupyter" os.environ['HADOOP_USER_NAME'] = 'prod_sec_strategy_tech' os.environ['HADOOP_USER_PASSWORD'] = 'TbjnfqTIRFXFidmhfctN3ZT9QwncfQfY' # os.environ['QUEUE'] = 'root.sec_technology_sec_engine_tenant_prod' spark3conf = { "master": "yarn", "spark.submit.deployMode": "client", "driver-memory": "4g", "spark.dynamicAllocation.enabled": "true", "spark.dynamicAllocation.minExecutors": "100", "spark.dynamicAllocation.maxExecutors": "200", "spark.executor.cores": "3", "spark.executor.memory": "10g", "spark.executor.memoryOverhead": "8192", "spark.sql.hive.manageFilesourcePartitions": "false", "spark.default.parallelism": "1000", "spark.sql.shuffle.partitions": "1000", "spark.yarn.queue": "root.sec_technology_sec_engine_tenant_prod", "spark.sql.autoBroadcastJoinThreshold": "-1", "spark.sql.broadcastTimeout": "3000", "spark.driver.extraJavaOptions": "-Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS", "spark.executor.extraJavaOptions": "-Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS", "spark.yarn.dist.archives": "hdfs://difed/user/dm/ENV/nithavellir/v3.0/py3.10.16_lite.tgz", "spark.executorEnv.PYSPARK_PYTHON": "./py3.10.16_lite.tgz/py3.10.16_lite/bin/python3", "spark.extraListeners": "sparkmonitor.listener.JupyterSparkMonitorListener", "spark.jars": ( "hdfs://difed/user/dm/ENV/jars/spark_tfrecord/compile/spark-tfrecord-0.5.1_scala2.12-spark3.2.0-tfhp1.15.0.jar" ), } spark_app = SparkBaseApp() spark_app.initialize(app_name, spark3conf) spark = spark_app.spark hdfs = spark_app.hdfs 为什么会失败呢
最新发布
08-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值