Spark的Lost executor错误问题

本文主要探讨Spark on Yarn运行时出现lost executor的问题。一是executor - memory或executor - cores设置不合理,超过Yarn可调度资源上限,如CPU核心数超上限,修改executor - cores可解决;二是物理内存不够,可加大内存来解决问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.  

19/06/17 09:50:52 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

19/06/17 09:50:52 ERROR cluster.YarnScheduler: Lost executor 2 on hadoop-master: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

19/06/17 09:50:52 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 0.0 (TID 17, hadoop-master, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

19/06/17 09:50:52 WARN scheduler.TaskSetManager: Lost task 21.0 in stage 0.0 (TID 16, hadoop-master, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1560518528256_0014_01_000003 on host: hadoop-master. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

Spark on Yarn运行时,Yarn调度的executor资源不够,所以kill掉了executor,出现lost executor的情况。

大多是 executor-memory 或者 executor-cores 设置不合理,超过了Yarn可以调度资源的最高上限(内存或者CPU核心)。

 

如,3台服务器,32核心,64GB内存

yarn.scheduler.maximum-allocation-mb = 68G

 

设置:

num-executors = 30(一个节点上运行10个executor)

executor-memory = 6G (每个executor分配6G内存)

executor-cores = 5 (每个executor分配5个核心)

每个节点上executor占用的内存:10*6.5 (有512堆内内存) =  65G 没超出上限

每个节点上executor占用的核心:10*5 =  50 超过上限,错误

 

修改 executor-cores = 3 问题解决

 

内存设置问题同上

 

2.

19/10/25 10:25:14 ERROR cluster.YarnScheduler: Lost executor 9 on cdh-master: Container killed by YARN for exceeding memory limits.  9.5 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
19/10/25 10:25:14 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 7.0 (TID 690, cdh-master, executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits.  9.5 GB of 9 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

 问题很明显,物理内存不够。

sudo -uhdfs spark-submit \
--class com.sm.analysis.AnalysisRetained \
--master yarn \
--deploy-mode client \
--driver-memory 3G \
--driver-cores 3 \
--num-executors 3 \
--executor-memory 8g \
--executor-cores 5 \
--jars /usr/java/jdk1.8.0_211/lib/mysql-connector-java-5.1.47.jar \
--conf spark.default.parallelism=30 \
/data4/liujinhe/tmp/original-analysis-1.0-SNAPSHOT.jar \

本来 executor 分配 8G,加上堆外内存(9 x 1024m x 0.07‬ = 645m,增量512m,所以取整1024m)1G,9G内存不够,所以加大内存为10G。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

訾零

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值