1.两种memory的职能
execution: 在shuffles,join,sort and aggregation中的计算
storage:跨集群内部数据的catche和propagating
2.它们之间的角逐,
两者共享一个统一区域(M)
(1)当excution空闲的时候,storage会使用全部可用的内存,反之亦然
(2)Ex会抢掉storage在必要的时候,但是只有全部的storage内存减少到一定的threadhold值®时
In other words, R describes a subregion within M where cached blocks are never evicted. Storage may not evict execution due to complexities in implementation
(3)换言之,R描述的是在统一内存区域的子区域,其中被缓存块是不能被抢掉的。
(4) Storage不会抢掉execution 是因为在操作过程,执行过程很复杂。
3.这种设计的优势
(1) 应用程序没有cache和progating时会,xecution使用掉全部内存,从而避免不必要的磁盘溢出。
(2) 应用程序有cache时,会reserve最小的storage内存,从而使数据块不受影响
(3) Lastly, this approach provides reasonable out-of-the-box performance for a variety of workloads without requiring user expertise of how memory is divided internally.
其实,在大多数情况下,默认参数基本能满足我们生产的需求
关于两个配置参数的解释
spark.memory.fraction 0.6
也就是M = (JVM heap space -300M)*0.6 统一内存
Rest Memory = (JVM heap space -300M)*0.4 :用户数据结构,在spark中内部的元数据和防止OOM的机制,在sparse 和 异常多records的情况下。
spark.memory.storageFraction 是storage memory占统一内存(M)的比值,默认是0.5.R is the storage space within M where cached blocks immune to being evicted by execution.R的存在是为了保证被缓存的数据块不受影响。
The value of spark.memory.fraction should be set in order to fit this amount of heap space comfortably within the JVM’s old or “tenured” generation. See the discussion of advanced GC tuning below for details.