spark core源码阅读-内存管理(七)
存储级别
每个StorageLevel都会记录是否使用内存或ExternalBlockStore,
如果内存或ExternalBlockStore内存不足,是否将RDD丢弃到磁盘,
是否以串行化格式保存内存中的数据以及是否在多个节点上复制RDD分区。
class StorageLevel private(
private var _useDisk: Boolean,//磁盘
private var _useMemory: Boolean,//内存
private var _useOffHeap: Boolean,//堆外内存
private var _deserialized: Boolean,//是否序列化
private var _replication: Int = 1)//副本数量
根据StorageLevel
类定义以下级别:
NONE,DISK_ONLY,DISK_ONLY_2,MEMORY_ONLY = new Sto
MEMORY_ONLY_2,MEMORY_ONLY_SER,MEMORY_ONLY_SER_2,MEMORY_AND_DISK,
MEMORY_AND_DISK_2,MEMORY_AND_DISK_SER,MEMORY_AND_DISK_SER_2,
OFF_HEAP(与MEMORY_ONLY_SER类似,但将数据存储在堆外存储器中)
内存管理-MemoryManager
Reserved Memory: spark 运行时driver或executor jvm进程消耗的内存
User(Other) Memory: 可以保存用户自己数据,比如mapPartitions
用户用来聚合数据
Storage Memory: 包括broadcast数据,rdd中cache数据
Execution Memory: spark core运行task时使用,在shuffle时map中sort/aggregation时需要缓存部分数据,缓存不足溢出磁盘
Unroll:
正如看到那样,迭代器数据cache到内存,这里使用数内存就是Unroll Memory,如果内存不足又不能溢出磁盘,则不能缓
存直接返回,下次使用的时候只能重新计算
StaticMemoryManager
这种静态分配导致storage与execution对内存利用不是很充分
storage memory: 54% 预留: 6%
unroller memory: 10.8%
storage memory: 43.2%
execution memory: 16% 预留: 4%
other: 20%private def getMaxStorageMemory(conf: SparkConf): Long = { val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory) val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6) val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9) (systemMaxMemory * memoryFraction * safetyFraction).toLong } private def getMaxExecutionMemory(conf: SparkConf): Long = { val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory) val memoryFraction = conf.getDouble("spark.shuffle.memoryFraction", 0.2) val safetyFraction = conf.getDouble("spark.shuffle.safetyFraction", 0.8) (systemMaxMemory * memoryFraction * safetyFraction).toLong } private val maxUnrollMemory: Long = { (maxStorageMemory * conf.getDouble("spark.storage.unrollFraction", 0.2)).toLong }
UnifiedMemoryManager
预留300M,JVM至少450M,MaxMemory=(JVM-300M)*75%,还有25%other,默认storage占MaxMemory的50%,storage与execution
可以根据需要在对方空余的情况下占用空余的内存