Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used

最新推荐文章于 2024-07-11 15:23:01 发布

转载最新推荐文章于 2024-07-11 15:23:01 发布 · 1.3k 阅读

大数据专栏收录该内容

8 篇文章

订阅专栏

执行spark时遇到这种问题，最开始–executor-memory 设为10G，到后来20G，30G，还是报同样的错误。

1.一种解决方法

网上大部分都说要增加spark.yarn.executor.memoryOverhead，先是2048，然后4096，后来干脆增加到15G（并将executor-memory调小到20G），不再报错。

但一直很郁闷，到底是为什么呢？

首先可以肯定的一点是增加spark.yarn.executor.memoryOverhead是有效的。

spark.yarn.XXX.memoryOverhead属性决定向 YARN 请求的每个 executor 或dirver或am 的额外堆内存大小，默认值为 max(384, 0.07 * spark.executor.memory)。

2.另一种解决方法

查另一篇博客，也是同样的问题：We figured out that the physical memory usage was quite low on the VMs but the virtual memory usage was extremely high. We set yarn.nodemanager.vmem-check-enabled in yarn-site.xml to false and our containers were no longer killed, and the application appeared to work as expected.

也就是说executor运行的时候物理内存实际利用很低，但虚拟内存却很高，然后在yarn-site.xml上将yarn.nodemanager.vmem-check-enabled设置为false，问题就解决了。

Since on Centos/RHEL 6 there are aggressive allocation of virtual memory due to OS behavior, you should disable virtual memory checker or increase yarn.nodemanager.vmem-pmem-ratio to a relatively larger value.

这其实是系统问题，操作系统在分配虚拟内存的时候太过粗暴，因此你需要关闭虚拟内存检查或者增加yarn.nodemanager.vmem-pmem-ratio

3.原因

在另一篇博客中

Virtual/physical memory checker
NodeManager can monitor the memory usage(virtual and physical) of the container. If its virtual memory exceeds “yarn.nodemanager.vmem-pmem-ratio” times the “mapreduce.reduce.memory.mb” or “mapreduce.map.memory.mb”, then the container will be killed if “yarn.nodemanager.vmem-check-enabled” is true;

If its physical memory exceeds “mapreduce.reduce.memory.mb” or “mapreduce.map.memory.mb”, the container will be killed if “yarn.nodemanager.pmem-check-enabled” is true.

The parameters below can be set in yarn-site.xml on each NM nodes to override the default behavior.

This is a sample error for a container killed by virtual memory checker:

Current usage: 347.3 MB of 1 GB physical memory used; 
<font color="red">2.2 GB of 2.1 GB virtual memory used</font>. Killing container.
And this is a sample error for physical memory checker:

Current usage: <font color="red">2.1gb of 2.0gb physical memory used</font>; 
1.1gb of 3.15gb virtual memory used. Killing container.

As in Hadoop 2.5.1 of MapR 4.1.0, virtual memory checker is disabled while physical memory checker is enabled by default.

f the above errors occur, it is also possible that the MapReduce job has memory leaking or the memory for each container is just not enough. Try to check the application logic and also tune the container memory request—“mapreduce.reduce.memory.mb” or “mapreduce.map.memory.mb”.