Spark源码解读(7)——内存管理

本文详细探讨了Spark内存管理中的UnifiedMemoryManager,分析了StorageMemory和ExecutionMemory的划分,特别是ExecutionMemory在Task执行时如何动态调整。通过实例展示了当Task需要超过ExecutionMemory时,如何从StorageMemory借用内存,以及由此引发的任务等待问题。同时,文章对比了StaticMemoryManager,并指出其内存利用率和任务等待的权衡。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Spark的内存主要由MemoryManager来管理,其管理的内存分为两个部分:StorageMemory和ExecutionMemory,ExecutionMemory又分为onHeap和offHeap

其中StorageMemory主要给BlockManager用,属于Spark存储系统的一部分,ExecutionMemory则主要为执行Task用,主要是Shuffle过程的结果写入

  @GuardedBy("this")
  protected val storageMemoryPool = new StorageMemoryPool(this)
  @GuardedBy("this")
  protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")
  @GuardedBy("this")
  protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")
首先,看下各个区域的大小:

  storageMemoryPool.incrementPoolSize(storageMemory)
  onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory)
  offHeapExecutionMemoryPool.incrementPoolSize(conf.getSizeAsBytes("spark.memory.offHeap.size", 0))
MemoryManager主要有两个子类:StaticMemoryManager、UnifiedMemoryManager


因为默认用UnifiedMemoryManager所以这里以UnifiedMemoryManager为例进行分析:

    val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
    val memoryManager: MemoryManager =
      if (useLegacyMemoryManager) {
        new StaticMemoryManager(conf, numUsableCores)
      } else {
        UnifiedMemoryManager(conf, numUsableCores)
      }
以所有参数均为默认情况来分析每个区域的内存大小:

object UnifiedMemoryManager {

  // Set aside a fixed amount of memory for non-storage, non-execution purposes.
  // This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve
  // sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then
  // the memory used for execution and storage will be (1024 - 300) * 0.75 = 543MB by default.
  private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024

  def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
    val maxMemory = getMaxMemory(conf)
    new UnifiedMemoryManager(
      conf,
      maxMemory = maxMemory,
      storageRegionSize =
        (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
      numCores = numCores)
  }

  /**
   * Return the total amount of memory shared between execution and storage, in bytes.
   */
  private def getMaxMemory(conf: SparkConf): Long = {
    val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
    val reservedMemory = conf.getLong("spark.testing.reservedMemory",
      if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
    val minSystemMemory = reservedMemory * 1.5
    if (systemMemory < minSystemMemory) {
      throw new IllegalArgumentException(s"System memory $systemMemory must " +
        s"be at least $minSystemMemory. Please use a larger heap size.")
    }
    val usableMemory = systemMemory - reservedMemory
    val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)
    (usableMemory * memoryFraction).toLong
  }
}

private[spark] class UnifiedMemoryManager private[memory] (
    conf: SparkConf,
    val maxMemory: Long,
    storageRegionSize: Long,
    numCores: Int)
  extends MemoryManager(
    conf,
    numCores,
    storageRegionSize,
    maxMemory - storageRegionSize) {
  
从上面的代码可知MemoryManager管理的堆上内存如下图(默认参数情况下):

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值