先来张原理图:
从rdd的iterator方法开始 , 因为在读取rdd的数据时是从iterator方法开始迭代数据的:
/*** Internal method to this RDD; will read from cache if applicable, or otherwise compute it.* This should ''not'' be called by users directly, but is available for implementors of custom* subclasses of RDD.** RDD的迭代方法 , 获取RDD中的数据*/final def iterator(split: Partition, context: TaskContext): Iterator[T] = {// 如果StorageLevel不为NONE , 表示之前持久化过RDD那么就不直接去从父RDD执行算子计算新的RDD的partition// 优先尝试使用CacheManager去获取持久化的数据if (storageLevel != StorageLevel.NONE) {// CacheManagerSparkEnv.get.cacheManager.getOrCompute(this, split, context, storageLevel)} else {computeOrReadCheckpoint(split, context)}}
1. 首先深入Cachemanager的getOrCompute方法 , 源码如下:
def getOrCompute[T](rdd: RDD[T],partition: Partition,context: TaskContext,storageLevel: StorageLevel): Iterator[T] = {val key = RDDBlockId(rdd.id, partition.index)logDebug(s"Looking for partition $key")// 直接用BlockManager来获取数据 , 如果获取到了那就直接返回就好了blockManager.get(key) match {case Some(blockResult) =>// Partition is already materialized, so just return its valuesval inputMetrics = blockResult.inputMetricsval existingMetrics = context.taskMetrics.getInputMetricsForReadMethod(inputMetrics.readMethod)existingMetrics.incBytesRead(inputMetrics.bytesRead)val iter = blockResult.data.asInstanceOf[Iterator[T]]new InterruptibleIterator[T](context, iter) {override def next(): T = {existingMetrics.incRecordsRead(1)delegate.next()}}// 如果BlockManager没有获取到数据 , 虽然rdd持久化过但是因为未知的原因数据既不在本地内存或磁盘也不再远程的BlockManager上// 那么需要做后续的处理case None =>// 再次尝试一次BlockManager的get方法去获取数据 , 如果获取到了就直接返回数据若是没有获取到继续往后走val storedValues = acquireLockForPartition[T](key)if (storedValues.isDefined) {return new InterruptibleIterator[T](context, storedValues.get)}// Otherwise, we have to load the partition ourselvestry {logInfo(s"Partition $key not found, computing it")// 如果computeOrReadCheckpoint()方法 , 如果rdd之前checkPoint过 , 那么就尝试读取它的checkpoint// 但是如果rdd没有checkpoint过 , 那么此时就别无选择 , 只能重新使用父rdd的数据执行算子计算一份val computedValues = rdd.computeOrReadCheckpoint(partition, context)// If the task is running locally, do not persist the resultif (context.isRunningLocally) {return computedValues}// Otherwise, cache the values and keep track of any updates in block statusesval updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]// 由于走CacheManager肯定意味着rdd是设置过持久化级别的// 只是因为某些原因持久化的数据没有找到那么才会走到这里来// 所以读取了checkpoint数据或者是重新计算数据之后要用putInBlockManager方法将数据在BlockManager中持久化一份val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks)val metrics = context.taskMetricsval lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq)new InterruptibleIterator(context, cachedValues)} finally {loading.synchronized {loading.remove(key)loading.notifyAll()}}}}
上面的代码就是说一步一步往持久化级别更低的方式去获取数据 , 先内存在磁盘 , 最后若是走到了putInBlockManager方法那就表示这一份数据是经过父RDD重新计算得来,
那么本身这份数据是设置过持久化级别的但是就是在通过CacheManager获取数据时失败那么就需要再一次将这份数据持久化 ,
putInBlockManager方法源码如下:
private def putInBlockManager[T](key: BlockId,values: Iterator[T],level: StorageLevel,updatedBlocks: ArrayBuffer[(BlockId, BlockStatus)],effectiveStorageLevel: Option[StorageLevel] = None): Iterator[T] = {val putLevel = effectiveStorageLevel.getOrElse(level)// 如果持久化级别没有指定内存级别仅仅是纯磁盘的级别if (!putLevel.useMemory) {/** This RDD is not to be cached in memory, so we can just pass the computed values as an* iterator directly to the BlockManager rather than first fully unrolling it in memory.*/updatedBlocks ++=// 那么直接调用BlockManager的putIterator()方法将数据写入磁盘即可blockManager.putIterator(key, values, level, tellMaster = true, effectiveStorageLevel)blockManager.get(key) match {case Some(v) => v.data.asInstanceOf[Iterator[T]]case None =>logInfo(s"Failure to store $key")throw new BlockException(key, s"Block manager failed to return cached value for $key!")}// 如果指定了内存存储级别} else {/** This RDD is to be cached in memory. In this case we cannot pass the computed values* to the BlockManager as an iterator and expect to read it back later. This is because* we may end up dropping a partition from memory store before getting it back.** In addition, we must be careful to not unroll the entire partition in memory at once.* Otherwise, we may cause an OOM exception if the JVM does not have enough space for this* single partition. Instead, we unroll the values cautiously, potentially aborting and* dropping the partition to disk if applicable.*/// 这里会调用MemoryStore的unrollSafely()方法尝试将数据写入内存// 如果unrollSafely()方法判断数据可以写入内存那么写入 , 反之则只能写入文件blockManager.memoryStore.unrollSafely(key, values, updatedBlocks) match {case Left(arr) =>// We have successfully unrolled the entire partition, so cache it in memoryupdatedBlocks ++=blockManager.putArray(key, arr, level, tellMaster = true, effectiveStorageLevel)arr.iterator.asInstanceOf[Iterator[T]]case Right(it) =>// There is not enough space to cache this partition in memoryval returnValues = it.asInstanceOf[Iterator[T]]// 如果有些数据是在无法写入内存那么就判断数据是否有磁盘级别 , 有的话就写入磁盘if (putLevel.useDisk) {logWarning(s"Persisting partition $key to disk instead.")val diskOnlyLevel = StorageLevel(useDisk = true, useMemory = false,useOffHeap = false, deserialized = false, putLevel.replication)putInBlockManager[T](key, returnValues, level, updatedBlocks, Some(diskOnlyLevel))} else {returnValues}}}}
在深入到unrollSafely()方法尝试将数据写入内存 :
def unrollSafely( blockId: BlockId, values: Iterator[Any], droppedBlocks: ArrayBuffer[(BlockId, BlockStatus)]) : Either[Array[Any], Iterator[Any]] = { // Number of elements unrolled so far var elementsUnrolled = 0 // Whether there is still enough memory for us to continue unrolling this block var keepUnrolling = true // Initial per-thread memory to request for unrolling blocks (bytes). Exposed for testing. val initialMemoryThreshold = unrollMemoryThreshold // How often to check whether we need to request more memory val memoryCheckPeriod = 16 // Memory currently reserved by this thread for this particular unrolling operation var memoryThreshold = initialMemoryThreshold // Memory to request as a multiple of current vector size val memoryGrowthFactor = 1.5 // Previous unroll memory held by this thread, for releasing later (only at the very end) val previousMemoryReserved = currentUnrollMemoryForThisThread // Underlying vector for unrolling the block var vector = new SizeTrackingVector[Any] // Request enough memory to begin unrolling keepUnrolling = reserveUnrollMemoryForThisThread(initialMemoryThreshold) if (!keepUnrolling) { logWarning(s"Failed to reserve initial memory threshold of " + s"${Utils.bytesToString(initialMemoryThreshold)} for computing block $blockId in memory.") } // Unroll this block safely, checking whether we have exceeded our threshold periodically try { while (values.hasNext && keepUnrolling) { vector += values.next() if (elementsUnrolled % memoryCheckPeriod == 0) { // If our vector's size has exceeded the threshold, request more memory val currentSize = vector.estimateSize() if (currentSize >= memoryThreshold) { val amountToRequest = (currentSize * memoryGrowthFactor - memoryThreshold).toLong // Hold the accounting lock, in case another thread concurrently puts a block that // takes up the unrolling space we just ensured here accountingLock.synchronized { if (!reserveUnrollMemoryForThisThread(amountToRequest)) { // If the first request is not granted, try again after ensuring free space // If there is still not enough space, give up and drop the partition val spaceToEnsure = maxUnrollMemory - currentUnrollMemory // 反复判断只要还有数据需要写入内存并且可以继续尝试写入内存那么就判断内存大小是否够用 // 如果不够用的话调用ensureFreeSpace()反复尝试清空一些内存空间 if (spaceToEnsure > 0) { val result = ensureFreeSpace(blockId, spaceToEnsure) droppedBlocks ++= result.droppedBlocks } keepUnrolling = reserveUnrollMemoryForThisThread(amountToRequest) } } // New threshold is currentSize * memoryGrowthFactor memoryThreshold += amountToRequest } } elementsUnrolled += 1 } if (keepUnrolling) { // We successfully unrolled the entirety of this block Left(vector.toArray) } else { // We ran out of space while unrolling the values for this block logUnrollFailureMessage(blockId, vector.estimateSize()) Right(vector.iterator ++ values) } } finally { // If we return an array, the values returned do not depend on the underlying vector and // we can immediately free up space for other threads. Otherwise, if we return an iterator, // we release the memory claimed by this thread later on when the task finishes. if (keepUnrolling) { val amountToRelease = currentUnrollMemoryForThisThread - previousMemoryReserved releaseUnrollMemoryForThisThread(amountToRelease) } } }
最后在看看putIterator方法将数据写入磁盘:
def putIterator( blockId: BlockId, values: Iterator[Any], level: StorageLevel, tellMaster: Boolean = true, effectiveStorageLevel: Option[StorageLevel] = None): Seq[(BlockId, BlockStatus)] = { require(values != null, "Values is null") doPut(blockId, IteratorValues(values), level, tellMaster, effectiveStorageLevel) }
其实还是调用了BlockManager的doPut方法 , doPut方法调用就是上一章节讲到的BlockManager原理咯 , 这里不在分析
以上就是CacheManager的整个过程 , 其实从我们的代码中设置rdd的持久化persist开始 , CacheManager就开始工作 , 将数据持久化 , 当后面在需要用到这个rdd的时候 , 调用rdd的iterator方法开始找寻持久化的rdd对应的那份数据 , 若是没有找到则从父RDD重新计算并再一次进行持久化 ,这就是CacheManager的整个作用 !

本文深入解析Spark中的CacheManager组件工作原理,从RDD的迭代方法入手,详细阐述如何通过不同存储级别获取数据,以及数据如何被缓存和持久化。
233

被折叠的 条评论
为什么被折叠?



