Spark-storage

最新推荐文章于 2023-03-18 10:29:26 发布

原创最新推荐文章于 2023-03-18 10:29:26 发布 · 910 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#spark #io

spark 专栏收录该内容

23 篇文章

订阅专栏

本文深入探讨了Apache Spark中的存储机制，包括RDD、StorageLevel、BlockManager等关键概念，详细解析了如何在不同场景下选择合适的存储策略，以及如何在堆内存、磁盘和BigMemory间进行高效的数据存储与管理。

Spark-storage

@(spark)[storage]

java.nio

java的new io库，作为预备知识，需要先看一下
推荐入门, 中文翻译版

RDDInfo

utils类，描述RDD的信息

StorageLevel

/**                                                                                                                                                                     
 * :: DeveloperApi ::                                                                                                                                                   
 * Flags for controlling the storage of an RDD. Each StorageLevel records whether to use memory,                                                                        
 * or Tachyon, whether to drop the RDD to disk if it falls out of memory or Tachyon , whether to                                                                        
 * keep the data in memory in a serialized format, and whether to replicate the RDD partitions on                                                                       
 * multiple nodes.                                                                                                                                                      
 *                                                                                                                                                                      
 * The [[org.apache.spark.storage.StorageLevel$]] singleton object contains some static constants                                                                       
 * for commonly useful storage levels. To create your own storage level object, use the                                                                                 
 * factory method of the singleton object (`StorageLevel(...)`).                                                                                                        
 */                                                                                                                                                                     
@DeveloperApi                                                                                                                                                           
class StorageLevel private(                                                                                                                                             
    private var _useDisk: Boolean,                                                                                                                                      
    private var _useMemory: Boolean,                                                                                                                                    
    private var _useOffHeap: Boolean,                                                                                                                                   
    private var _deserialized: Boolean,                                                                                                                                 
    private var _replication: Int = 1)

堆Heap是内存中动态分配对象居住的地方。如果使用new一个对象，它就被分配在堆内存上。这是相对于Stack，如果你有一个局部变量则它是位于Stack栈内存空间。

BigMemory是用来避免GC对堆的开销，从几MB或GB大。 BigMemory通过直接的ByteBuffers使用JVM进程的内存地址空间，不像其他原生Java对象接受GC管束。

EHCache(Terrcotta BigMemory)的 off-heap将你的对象从堆中脱离出来序列化，然后存储在一大块内存中，这就像它存储到磁盘上上一样，但它仍然在RAM中。对象在这种状态下不能直接使用，它们必须首先反序列化。也不受垃圾收集。序列化和反序列化会影响性能。(FST-serialization还是很快)。

实际上是用的是如下的storageLevel

  val NONE = new StorageLevel(false, false, false, false)                                                                                                               
  val DISK_ONLY = new StorageLevel(true, false, false, false)                                                                                                           
  val DISK_ONLY_2 = new StorageLevel(true, false, false, false, 2)                                                                                                      
  val MEMORY_ONLY = new StorageLevel(false, true, false, true)                                                                                                          
  val MEMORY_ONLY_2 = new StorageLevel(false, true, false, true, 2)                                                                                                     
  val MEMORY_ONLY_SER = new StorageLevel(false, true, false, false)                                                                                                     
  val MEMORY_ONLY_SER_2 = new StorageLevel(false, true, false, false, 2)                                                                                                
  val MEMORY_AND_DISK = new StorageLevel(true, true, false, true)                                                                                                       
  val MEMORY_AND_DISK_2 = new StorageLevel(true, true, false, true, 2)                                                                                                  
  val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)                                                                                                  
  val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)                                                                                             
  val OFF_HEAP = new StorageLevel(false, false, true, false)

BlockManagerId

/**                                                                                                                                                                     
 * :: DeveloperApi ::                                                                                                                                                   
 * This class represent an unique identifier for a BlockManager.                                                                                                        
 *                                                                                                                                                                      
 * The first 2 constructors of this class is made private to ensure that BlockManagerId objects                                                                         
 * can be created only using the apply method in the companion object. This allows de-duplication                                                                       
 * of ID objects. Also, constructor parameters are private to ensure that parameters cannot be                                                                          
 * modified from outside this class.                                                                                                                                    
 */                                                                                                                                                                     
@DeveloperApi                                                                                                                                                           
class BlockManagerId private (

BlockId

/**                                                                                                                                                                     
 * :: DeveloperApi ::                                                                                                                                                   
 * Identifies a particular Block of data, usually associated with a single file.                                                                                        
 * A Block can be uniquely identified by its filename, but each type of Block has a different                                                                           
 * set of keys which produce its unique name.                                                                                                                           
 *                                                                                                                                                                      
 * If your BlockId should be serializable, be sure to add it to the BlockId.apply() method.                                                                             
 */  

 实际上是用的Block类型
  /** Converts a BlockId "name" String back into a BlockId. */                                                                                                          
  def apply(id: String) = id match {                                                                                                                                    
    case RDD(rddId, splitIndex) =>                                                                                                                                      
      RDDBlockId(rddId.toInt, splitIndex.toInt)                                                                                                                         
    case SHUFFLE(shuffleId, mapId, reduceId) =>                                                                                                                         
      ShuffleBlockId(shuffleId.toInt, mapId.toInt, reduceId.toInt)                                                                                                      
    case SHUFFLE_DATA(shuffleId, mapId, reduceId) =>                                                                                                                    
      ShuffleDataBlockId(shuffleId.toInt, mapId.toInt, reduceId.toInt)                                                                                                  
    case SHUFFLE_INDEX(shuffleId, mapId, reduceId) =>                                                                                                                   
      ShuffleIndexBlockId(shuffleId.toInt, mapId.toInt, reduceId.toInt)                                                                                                 
    case BROADCAST(broadcastId, field) =>                                                                                                                               
      BroadcastBlockId(broadcastId.toLong, field.stripPrefix("_"))                                                                                                      
    case TASKRESULT(taskId) =>                                                                                                                                          
      TaskResultBlockId(taskId.toLong)                                                                                                                                  
    case STREAM(streamId, uniqueId) =>                                                                                                                                  
      StreamBlockId(streamId.toInt, uniqueId.toLong)                                                                                                                    
    case TEST(value) =>                                                                                                                                                 
      TestBlockId(value)                                                                                                                                                
    case _ =>                                                                                                                                                           
      throw new IllegalStateException("Unrecognized BlockId: " + id)                                                                                                    
  }

PutResult

/**                                                                                                                                                                     
 * Result of adding a block into a BlockStore. This case class contains a few things:                                                                                   
 *   (1) The estimated size of the put,                                                                                                                                 
 *   (2) The values put if the caller asked for them to be returned (e.g. for chaining                                                                                  
 *       replication), and                                                                                                                                              
 *   (3) A list of blocks dropped as a result of this put. This is always empty for DiskStore.                                                                          
 */                                                                                                                                                                     
private[spark] case class PutResult(                                                                                                                                    
    size: Long,                                                                                                                                                         
    data: Either[Iterator[_], ByteBuffer],                                                                                                                              
    droppedBlocks: Seq[(BlockId, BlockStatus)] = Seq.empty)

BlockManagerMessages

定义了BlockManager之间的message交互

BlockManager

/**                                                                                                                                                                     
 * Manager running on every node (driver and executors) which provides interfaces for putting and                                                                       
 * retrieving blocks both locally and remotely into various stores (memory, disk, and off-heap).                                                                        
 *                                                                                                                                                                      
 * Note that #initialize() must be called before the BlockManager is usable.                                                                                            
 */                                                                                                                                                                     
private[spark] class BlockManager(

这个是一个蛮长的文件，在这个文件中，定义了BlockManager，目前的实现中它是sparkEnv的一个memeber。

BlockManagerMasterActor

/**                                                                                                                                                                     
 * BlockManagerMasterActor is an actor on the master node to track statuses of                                                                                          
 * all slaves' block managers.                                                                                                                                          
 */                                                                                                                                                                     
private[spark]                                                                                                                                                          
class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus: LiveListenerBus)                                                                      
  extends Actor with ActorLogReceive with Logging {

它的核心就是一系列的HashMap

  // Mapping from block manager id to the block manager's information.                                                                                                  
  private val blockManagerInfo = new mutable.HashMap[BlockManagerId, BlockManagerInfo]                                                                                  

  // Mapping from executor ID to block manager ID.                                                                                                                      
  private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId]                                                                                    

  // Mapping from block id to the set of block managers that have the block.                                                                                            
  private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]]

BlockManagerSlaveActor

/**                                                                                                                                                                     
 * An actor to take commands from the master to execute options. For example,                                                                                           
 * this is used to remove blocks from the slave's BlockManager.                                                                                                         
 */                                                                                                                                                                     
private[storage]                                                                                                                                                        
class BlockManagerSlaveActor(                                                                                                                                           
    blockManager: BlockManager,                                                                                                                                         
    mapOutputTracker: MapOutputTracker)                                                                                                                                 
  extends Actor with ActorLogReceive with Logging {

slave 就比较简单了，基本上就是异步的执行一些操作就可以了。

ShuffleBlockFetcherIterator

/**                                                                                                                                                                     
 * An iterator that fetches multiple blocks. For local blocks, it fetches from the local block                                                                          
 * manager. For remote blocks, it fetches them using the provided BlockTransferService.                                                                                 
 *                                                                                                                                                                      
 * This creates an iterator of (BlockID, values) tuples so the caller can handle blocks in a                                                                            
 * pipelined fashion as they are received.                                                                                                                              
 *                                                                                                                                                                      
 * The implementation throttles the remote fetches to they don't exceed maxBytesInFlight to avoid                                                                       
 * using too much memory.                                                                                                                                               
 *                                                                                                                                                                      
 * @param context [[TaskContext]], used for metrics update                                                                                                              
 * @param shuffleClient [[ShuffleClient]] for fetching remote blocks                                                                                                    
 * @param blockManager [[BlockManager]] for reading local blocks                                                                                                        
 * @param blocksByAddress list of blocks to fetch grouped by the [[BlockManagerId]].                                                                                    
 *                        For each block we also require the size (in bytes as a long field) in                                                                         
 *                        order to throttle the memory usage.                                                                                                           
 * @param serializer serializer used to deserialize the data.                                                                                                           
 * @param maxBytesInFlight max size (in bytes) of remote blocks to fetch at any given point.                                                                            
 */   
private[spark]                                                                                                                                                          
final class ShuffleBlockFetcherIterator(                                                                                                                                
    context: TaskContext,                                                                                                                                               
    shuffleClient: ShuffleClient,                                                                                                                                       
    blockManager: BlockManager,                                                                                                                                         
    blocksByAddress: Seq[(BlockManagerId, Seq[(BlockId, Long)])],                                                                                                       
    serializer: Serializer,                                                                                                                                             
    maxBytesInFlight: Long)                                                                                                                                             
  extends Iterator[(BlockId, Try[Iterator[Any]])] with Logging {

介绍在Scala 中怎样使用一种函数式的方式来处理数据交互，包括入参及返回值。
Option: 解决null（空指针）问题
Either: 解决返回值不确定（返回两个值的其中一个）问题
Try: 解决函数可能会抛出异常问题

基本逻辑：
1. 分清楚local和remote节点的block
2. 向remote节点发送request
3. 在等待结果的过程中取local的block

BlockManagerMaster

通过driverActor控制blockManager

BlockStore

/**                                                                                                                                                                     
 * Abstract class to store blocks.                                                                                                                                      
 */                                                                                                                                                                     
private[spark] abstract class BlockStore(val blockManager: BlockManager) extends Logging {

在BlockStore的基础上会有各种各样的store来具体负责各种资源的store。

MemoryStore

/**                                                                                                                                                                     
 * Stores blocks in memory, either as Arrays of deserialized Java objects or as                                                                                         
 * serialized ByteBuffers.                                                                                                                                              
 */                                                                                                                                                                     
private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long)                                                                                           
  extends BlockStore(blockManager) {

MemoryStore可以缓存两种东西：
1. 一个btye流，需要copy
2. Array[any]，直接缓存指针

内部实际上是个HashMap，用来缓存data；注意目前实现是有锁的。
1. 在entries上面的锁。
有两类接口：get/put

get

override def getSize(blockId: BlockId): Long = {
override def getBytes(blockId: BlockId): Option[ByteBuffer] = {
override def getValues(blockId: BlockId): Option[Iterator[Any]] = {

put

override def putBytes(blockId: BlockId, _bytes: ByteBuffer, level: StorageLevel): PutResult = {
override def putArray(
override def putIterator(

关于put的基本逻辑是：
1. 首先检测空间是不是够
- 够，缓存之
- 不够，试着释放空间
2. 被释放的空间和不能被放在memoryStore中的block会被尝试写入DiskStore

另外还有一些诸如clear，remove之类的

DiskStore

DiskBlockManager

/**                                                                                                                                                                     
 * Creates and maintains the logical mapping between logical blocks and physical on-disk                                                                                
 * locations. By default, one block is mapped to one file with a name given by its BlockId.                                                                             
 * However, it is also possible to have a block map to only a segment of a file, by calling                                                                             
 * mapBlockToFileSegment().                                                                                                                                             
 *                                                                                                                                                                      
 * Block files are hashed among the directories listed in spark.local.dir (or in                                                                                        
 * SPARK_LOCAL_DIRS, if it's set).                                                                                                                                      
 */                                                                                                                                                                     
private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkConf)                                                                                      
  extends Logging {

一个mapping罢了，注意这里的disk指的是本地磁盘不是HDFS，stop的时候直接rm掉数据就可以了。

实际上DiskStore比MemoryStore还要再简单一些：
1. 都是byte流，不用像MemoryDisk一样区分Any和byte
2. 对于byte流，如果长度小就直接都读上来，否者用channel.map

TachyonStore

/**                                                                                                                                                                     
 * Stores BlockManager blocks on Tachyon.                                                                                                                               
 */                                                                                                                                                                     
private[spark] class TachyonStore(                                                                                                                                      
    blockManager: BlockManager,                                                                                                                                         
    tachyonManager: TachyonBlockManager)                                                                                                                                
  extends BlockStore(blockManager: BlockManager) with Logging {

比较像diskStore，在Tachyon的API上做了封装。

BlockManager

/**                                                                                                                                                                     
 * Manager running on every node (driver and executors) which provides interfaces for putting and                                                                       
 * retrieving blocks both locally and remotely into various stores (memory, disk, and off-heap).                                                                        
 *                                                                                                                                                                      
 * Note that #initialize() must be called before the BlockManager is usable.                                                                                            
 */                                                                                                                                                                     
private[spark] class BlockManager(                                                                                                                                      
    executorId: String,                                                                                                                                                 
    actorSystem: ActorSystem,                                                                                                                                           
    val master: BlockManagerMaster,                                                                                                                                     
    defaultSerializer: Serializer,                                                                                                                                      
    maxMemory: Long,                                                                                                                                                    
    val conf: SparkConf,                                                                                                                                                
    mapOutputTracker: MapOutputTracker,                                                                                                                                 
    shuffleManager: ShuffleManager,                                                                                                                                     
    blockTransferService: BlockTransferService,                                                                                                                         
    securityManager: SecurityManager,                                                                                                                                   
    numUsableCores: Int)                                                                                                                                                
  extends BlockDataManager with Logging {

基本上BlockManager就是上面提到的所有东西的集合，如果能看懂上面那堆参数，那么你就比我强了。

本地拿block的逻辑

看有没有这么一个block
看MemoryStore里有没有
看TachyonStore里有没有
看DiskStore里有没有

DoPut

和DoGet几乎类似的逻辑

Replicate

  /**                                                                                                                                                                   
   * Replicate block to another node. Not that this is a blocking call that returns after                                                                               
   * the block has been replicated.                                                                                                                                     
   */                                                                                                                                                                   
  private def replicate(blockId: BlockId, data: ByteBuffer, level: StorageLevel): Unit = {