Spark内存管理详解-优快云博客

本文链接：https://blog.youkuaiyun.com/UnionIBM/article/details/76166010

从物理上，分为堆内内存和堆外内存；从逻辑上分为execution内存和storage内存。
Execution内存主要是用来满足task执行过程中某些算子对内存的需求（shuffle，join等），例如shuffle过程中map端产生的中间结果需要缓存在内存中。Storage内存主要用来存储RDD持久化的数据或者广播变量（cache，broadcast）。

StaticMemoryManager。这种管理方式的缺陷不言自明，因为它不能根据不同的数据处理场景调整内存的比例，在内存使用和性能方面都存在局限性。

unifiedMemoryManger，Execution Memory和Storage Memory之间支持跨界使用

Storage内存与Execution内存的动态调整
Storage can borrow as much execution memory as is free until execution reclaims its space. When this happens, cached blocks will be evicted from memory until sufficient borrowed memory is released to satisfy the execution memory request.

Similarly, execution can borrow as much storage memory as is free. However, execution memory is never evicted by storage due to the complexities involved in implementing this. The implication is that attempts to cache blocks may fail if execution has already eaten up most of the storage space, in which case the new blocks will be evicted immediately according to their respective storage levels.

当execution内存有空闲的时候，storage可以借用execution的内存；当execution需要内存的时候， storage会释放借用的内存。这样做是安全的，因为storage内存如果不够可以溢出到本地磁盘。
当storage内存有空闲的时候也可以借给execution使用，但是当execution没有使用完的情况下是无法归还给storage的。因为execution是用来在计算过程中存储临时结果的，如果内存被释放会导致后续的计算失败。

https://www.ibm.com/developerworks/cn/analytics/library/ba-cn-apache-spark-memory-management/index.html?ca=drs-&utm_source=tuicool&utm_medium=referral

推荐阅读
https://0x0fff.com/spark-memory-management/

Spark 内存管理