参考文章 :
解决CDH SparkStreaming任务启动之后executor不停增长的问题,num-executors配置不管用。...
解决CDH SparkStreaming任务启动之后executor不停增长的问题,num-executors配置不管用。..._spark num-executors_杨五五的博客-优快云博客
spark中 Dynamic Allocation 以及 num-executors 的问题
spark中 Dynamic Allocation 以及 num-executors 的问题_EnterPine的博客-优快云博客
Spark Dynamic Allocation 分析
Spark Dynamic Allocation 分析 - 简书
spark 运行环境 : cdh 5.14 默认的 spark 版本
最近在提交作业的时候遇到了一个问题,特点是 executor 会动态增加。
经过研究,我发现主要的问题是,spark 该环境下会 默认设置 spark.dynamicAllocation.enabled 为 true , 该值会导致 executor 数量的动态变化。
另外需要注意的一点,对于这个问题,我们可以通过设置最大最小的并发度去解决,并没有必要直接关闭此参数。
动态扩容的相关配置参数:
Configuration - Spark 3.4.1 Documentation
Spark 2.4.0 latest
Dynamic Allocation
Property Name | Default | Meaning |
---|---|---|
spark.dynamicAllocation.enabled | false | Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. For more detail, see the description here. This requires spark.shuffle.service.enabled to be set. The following configurations are also relevant:spark.dynamicAllocation.minExecutors ,spark.dynamicAllocation.maxExecutors , andspark.dynamicAllocation.initialExecutors spark.dynamicAllocation.executorAllocationRatio |
spark.dynamicAllocation.executorIdleTimeout | 60s | If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more detail, see this description. |
spark.dynamicAllocation.cachedExecutorIdleTimeout | infinity | If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see this description. |
spark.dynamicAllocation.initialExecutors | spark.dynamicAllocation.minExecutors | Initial number of executors to run if dynamic allocation is enabled. If `--num-executors` (or `spark.executor.instances`) is set and larger than this value, it will be used as the initial number of executors. |
spark.dynamicAllocation.maxExecutors | infinity | Upper bound for the number of executors if dynamic allocation is enabled. |
spark.dynamicAllocation.minExecutors | 0 | Lower bound for the number of executors if dynamic allocation is enabled. |
spark.dynamicAllocation.executorAllocationRatio | 1 | By default, the dynamic allocation will request enough executors to maximize the parallelism according to the number of tasks to process. While this minimizes the latency of the job, with small tasks this setting can waste a lot of resources due to executor allocation overhead, as some executor might not even do any work. This setting allows to set a ratio that will be used to reduce the number of executors w.r.t. full parallelism. Defaults to 1.0 to give maximum parallelism. 0.5 will divide the target number of executors by 2 The target number of executors computed by the dynamicAllocation can still be overridden by the spark.dynamicAllocation.minExecutors andspark.dynamicAllocation.maxExecutors settings |
spark.dynamicAllocation.schedulerBacklogTimeout | 1s | If dynamic allocation is enabled and there have been pending tasks backlogged for more than this duration, new executors will be requested. For more detail, see this description. |
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout | schedulerBacklogTimeout | Same as spark.dynamicAllocation.schedulerBacklogTimeout , but used only for subsequent executor requests. For more detail, see this description. |
解决方法 :
设置 该值 为 false,
并显式指定 driver-cores , executor-cores 等属性
--driver-memory 2G \
--driver-cores 1 \
--num-executors 6 \
--executor-cores 2 \
--executor-memory 2G \
完整提交命令 :
nohup /usr/bin/spark2-submit \
--class ${class_name} \
--name ${JOB_NAME} \
--files ${config} \
--master yarn \
--conf spark.dynamicAllocation.enabled=false \
--driver-memory 2G \
--driver-cores 1 \
--num-executors 6 \
--executor-cores 2 \
--executor-memory 2G \
--jars ${classpath} \
${ROOT_PATH}/libs/${APP_NAME}-${libVersion}-SNAPSHOT.jar online ${config} \
> ${ROOT_PATH}/logs/start.error 2> ${ROOT_PATH}/logs/start.log &