Spark的内存主要由MemoryManager来管理,其管理的内存分为两个部分:StorageMemory和ExecutionMemory,ExecutionMemory又分为onHeap和offHeap
其中StorageMemory主要给BlockManager用,属于Spark存储系统的一部分,ExecutionMemory则主要为执行Task用,主要是Shuffle过程的结果写入
@GuardedBy("this")
protected val storageMemoryPool = new StorageMemoryPool(this)
@GuardedBy("this")
protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")
@GuardedBy("this")
protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")
首先,看下各个区域的大小:
storageMemoryPool.incrementPoolSize(storageMemory)
onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory)
offHeapExecutionMemoryPool.incrementPoolSize(conf.getSizeAsBytes("spark.memory.offHeap.size", 0))
MemoryManager主要有两个子类:StaticMemoryManager、UnifiedMemoryManager
因为默认用UnifiedMemoryManager所以这里以UnifiedMemoryManager为例进行分析:
val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
val memoryManager: MemoryManager =
if (useLegacyMemoryManager) {
new StaticMemoryManager(conf, numUsableCores)
} else {
UnifiedMemoryManager(conf, numUsableCores)
}
以所有参数均为默认情况来分析每个区域的内存大小:
object UnifiedMemoryManager {
// Set aside a fixed amount of memory for non-storage, non-execution purposes.
// This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve
// sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then
// the memory used for execution and storage will be (1024 - 300) * 0.75 = 543MB by default.
private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024
def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
val maxMemory = getMaxMemory(conf)
new UnifiedMemoryManager(
conf,
maxMemory = maxMemory,
storageRegionSize =
(maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
numCores = numCores)
}
/**
* Return the total amount of memory shared between execution and storage, in bytes.
*/
private def getMaxMemory(conf: SparkConf): Long = {
val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
val reservedMemory = conf.getLong("spark.testing.reservedMemory",
if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
val minSystemMemory = reservedMemory * 1.5
if (systemMemory < minSystemMemory) {
throw new IllegalArgumentException(s"System memory $systemMemory must " +
s"be at least $minSystemMemory. Please use a larger heap size.")
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)
(usableMemory * memoryFraction).toLong
}
}
private[spark] class UnifiedMemoryManager private[memory] (
conf: SparkConf,
val maxMemory: Long,
storageRegionSize: Long,
numCores: Int)
extends MemoryManager(
conf,
numCores,
storageRegionSize,
maxMemory - storageRegionSize) {
从上面的代码可知MemoryManager管理的堆上内存如下图(默认参数情况下):