Alluxio缓存策略

最新推荐文章于 2024-10-02 21:15:54 发布

尤小硕

最新推荐文章于 2024-10-02 21:15:54 发布

阅读量1.7k

点赞数 1

分类专栏： alluxio 大数据笔记

本文链接：https://blog.youkuaiyun.com/dongfangshuoshuo/article/details/108308036

版权

大数据笔记同时被 2 个专栏收录

19 篇文章

订阅专栏

alluxio

7 篇文章

订阅专栏

文章目录

重新认识一下Alluxio

Alluxio是一个基于内存的分布式缓存开源项目

Alluxio架构示意图：
在这里插入图片描述

缓存现状

缓存名词解释

缓存穿透
缓存穿透是指缓存和数据库中都没有的数据，而用户不断发起请求，这时的用户很可能是攻击者，攻击会导致数据库压力过大。
缓存击穿
缓存击穿是指缓存中没有但数据库中有的数据(一般是缓存时间到期)，这时由于并发用户特别多，同时读缓存没读到数据，又同时去数据库去读取数据，引起数据库压力瞬间增大，造成过大压力。

解决方案：
1.设置热点数据永不过期；
2.是否还有其他解决办法？？？？？？？

缓存雪崩
缓存雪崩是指缓存中数据大批量到过期时间，而查询数量巨大，引起数据库压力过大甚至宕机。和缓存击穿不同的是，缓存击穿指并发查询同一条数据，缓存雪崩是不同数据都过期了，很多数据都查不到从而查数据库。

解决方案：
1.缓存数据的过期时间设置随机，防止同一时间大量数据过期现象发生；
2.若缓存数据库是分布式部署，将热点数据均匀分布在不同的缓存数据库中；
3.设置热点数据永不过期；

为什么要使用Alluxio

因为它是基于内存的，摆脱了传统的基于磁盘的文件系统在IO方面的束缚，所以可以加速服务对数据的访问(众所周知，内存的IO速度是远远快于磁盘的IO速度的)

Alluxio有哪些关于缓存的设计

Alluxio缓存方案

客户端-读请求

本地命中
计算服务需要处理的数据就在本机启动的Alluxio worker缓存中，则计算服务最终直接从本机内存中指定位置读取数据，如下图所示：
远端worker命中
计算服务需要处理的数据在Alluxio集群的缓存中，但与计算服务不在同一台机器上，而是在远端worker的缓存中，则计算服务需要通过网络访问远端worker缓存中的数据，如下图所示：
缓存击穿/缺失(远端底层文件系统读)
计算服务需要处理的数据不在Alluxio集群的缓存中，则如果计算服务所在机器就是worker节点，Alluxio会将底层文件系统的数据返回并缓存到本地worker中，否则返回数据后缓存到距离本机最近的一个worker节点上，如下图所示：

客户端-写请求

只写Alluxio
同步缓存
数据同步写Alluxio和UFS
异步写缓存
数据先写入Alluxio，之后Alluxio自动同步到UFS
只写UFS

Alluxio缓存算法

调用方
主要由TieredBlockStore.freeSpaceInternal()方法调用，用于一级缓存层空间不够时，释放指定空间大小，达到配置的阈值。

缓存清除策略
下面的LRU和LRFU都继承自AbstractEvictor类，所以默认调用AbstractEvictor中的缓存清除策略。
该策略大致思路：对指定位置的块进行清除，并将要清除的块通过特定的分配策略移动到下一个缓存层的空间中，如果下一个缓存层没有足够的空间，则清除下一层的块数据后，递归再向下一层缓存寻找足够的空间来存储第二层清除的块中的数据。

示意图：
在这里插入图片描述

核心类关系图：
在这里插入图片描述

核心源码：AbstractEvictor.java

  /**
   * A recursive implementation of cascading eviction.
   *
   * This method uses a specific eviction strategy to find blocks to evict in the requested
   * location. After eviction, one {@link alluxio.worker.block.meta.StorageDir} in the location has
   * the specific amount of free space. It then uses an allocation strategy to allocate space in the
   * next tier to move each evicted blocks. If the next tier fails to allocate space for the evicted
   * blocks, the next tier will continue to evict its blocks to free space.
   *
   * This method is only used in
   * {@link #freeSpaceWithView(long, BlockStoreLocation, BlockMetadataManagerView)}.
   *
   * @param bytesToBeAvailable bytes to be available after eviction
   * @param location target location to evict blocks from
   * @param plan the plan to be recursively updated, is empty when first called in
   *        {@link #freeSpaceWithView(long, BlockStoreLocation, BlockMetadataManagerView)}
   * @param mode the eviction mode
   * @return the first {@link StorageDirView} in the range of location to evict/move bytes from, or
   *         null if there is no plan
   */
  protected StorageDirView cascadingEvict(long bytesToBeAvailable, BlockStoreLocation location,
      EvictionPlan plan, Mode mode) {
    location = updateBlockStoreLocation(bytesToBeAvailable, location);

    // 1. If bytesToBeAvailable can already be satisfied without eviction, return the eligible
    // StoargeDirView
    StorageDirView candidateDirView =
        EvictorUtils.selectDirWithRequestedSpace(bytesToBeAvailable, location, mManagerView);
    if (candidateDirView != null) {
      return candidateDirView;
    }

    // 2. Iterate over blocks in order until we find a StorageDirView that is in the range of
    // location and can satisfy bytesToBeAvailable after evicting its blocks iterated so far
    EvictionDirCandidates dirCandidates = new EvictionDirCandidates();
    Iterator<Long> it = getBlockIterator();
    while (it.hasNext() && dirCandidates.candidateSize() < bytesToBeAvailable) {
      long blockId = it.next();
      try {
        BlockMeta block = mManagerView.getBlockMeta(blockId);
        if (block != null) { // might not present in this view
          if (block.getBlockLocation().belongsTo(location)) {
            String tierAlias = block.getParentDir().getParentTier().getTierAlias();
            int dirIndex = block.getParentDir().getDirIndex();
            dirCandidates.add(mManagerView.getTierView(tierAlias).getDirView(dirIndex), blockId,
                block.getBlockSize());
          }
        }
      } catch (BlockDoesNotExistException e) {
        LOG.warn("Remove block {} from evictor cache because {}", blockId, e);
        it.remove();
        onRemoveBlockFromIterator(blockId);
      }
    }

    // 3. If there is no eligible StorageDirView, return null
    if (mode == Mode.GUARANTEED && dirCandidates.candidateSize() < bytesToBeAvailable) {
      return null;
    }

    // 4. cascading eviction: try to allocate space in the next tier to move candidate blocks
    // there. If allocation fails, the next tier will continue to evict its blocks to free space.
    // Blocks are only evicted from the last tier or it can not be moved to the next tier.
    candidateDirView = dirCandidates.candidateDir();
    if (candidateDirView == null) {
      return null;
    }
    List<Long> candidateBlocks = dirCandidates.candidateBlocks();
    StorageTierView nextTierView = mManagerView.getNextTier(candidateDirView.getParentTierView());
    if (nextTierView == null) {
      // This is the last tier, evict all the blocks.
      for (Long blockId : candidateBlocks) {
        try {
          BlockMeta block = mManagerView.getBlockMeta(blockId);
          if (block != null) {
            candidateDirView.markBlockMoveOut(blockId, block.getBlockSize());
            plan.toEvict().add(new Pair<>(blockId, candidateDirView.toBlockStoreLocation()));
          }
        } catch (BlockDoesNotExistException e) {
          continue;
        }
      }
    } else {
      for (Long blockId : candidateBlocks) {
        try {
          BlockMeta block = mManagerView.getBlockMeta(blockId);
          if (block == null) {
            continue;
          }
          StorageDirView nextDirView = mAllocator.allocateBlockWithView(
              Sessions.MIGRATE_DATA_SESSION_ID, block.getBlockSize(),
              BlockStoreLocation.anyDirInTier(nextTierView.getTierViewAlias()), mManagerView);
          if (nextDirView == null) {
            nextDirView = cascadingEvict(block.getBlockSize(),
                BlockStoreLocation.anyDirInTier(nextTierView.getTierViewAlias()), plan, mode);
          }
          if (nextDirView == null) {
            // If we failed to find a dir in the next tier to move this block, evict it and
            // continue. Normally this should not happen.
            plan.toEvict().add(new Pair<>(blockId, block.getBlockLocation()));
            candidateDirView.markBlockMoveOut(blockId, block.getBlockSize());
            continue;
          }
          plan.toMove().add(new BlockTransferInfo(blockId, block.getBlockLocation(),
              nextDirView.toBlockStoreLocation()));
          candidateDirView.markBlockMoveOut(blockId, block.getBlockSize());
          nextDirView.markBlockMoveIn(blockId, block.getBlockSize());
        } catch (BlockDoesNotExistException e) {
          continue;
        }
      }
    }

    return candidateDirView;
  }

  @Override
  public EvictionPlan freeSpaceWithView(long bytesToBeAvailable, BlockStoreLocation location,
      BlockMetadataManagerView view) {
    return freeSpaceWithView(bytesToBeAvailable, location, view, Mode.GUARANTEED);
  }

  @Override
  public EvictionPlan freeSpaceWithView(long bytesToBeAvailable, BlockStoreLocation location,
      BlockMetadataManagerView view, Mode mode) {
    mManagerView = view;

    List<BlockTransferInfo> toMove = new ArrayList<>();
    List<Pair<Long, BlockStoreLocation>> toEvict = new ArrayList<>();
    EvictionPlan plan = new EvictionPlan(toMove, toEvict);
    StorageDirView candidateDir = cascadingEvict(bytesToBeAvailable, location, plan, mode);

    mManagerView.clearBlockMarks();
    if (candidateDir == null) {
      return null;
    }

    return plan;
  }

总结AbstractEvictor的核心作用：
在释放缓存层指定大小空间时
AbstractEvictor.cascadingEvict()主要用于确定是否有符合要求的目录、块：

如果有，则直接返回满足要求的块、目录的位置，EvicPlan此时为空，即不需要清除任何块、目录
这一步是需要清除块来获取足够的目标空间，此时，对于LRU和LRFU是不同的实现
LRU则是迭代遍历一个排了序的map，从链首开始迭代，发现一个满足条件的块，就认为其是一个可以取消并驱逐的块，最后得到所有满足条件的目标块；
LRFU则是…
如果上一步得到的所有目标块的空间加起来都不够，则没有合格的目标目录可供驱逐，直接返回null
到了这一步则说明，有足够的目标块可供驱逐，此时，
如果满足条件的存储层没有下一层存储了，则直接收集并并标记要驱逐的块；
如果有下一级存储层，则在收集并标记要驱逐的块的同时，也收集要驱逐到下一存储层的块的位置信息
最后返回上面这些确认好的信息(要驱逐到下一层的块、下一层的预留位置…)
LRFUEvictor.freeSpaceWithView()在经过AbstractEvictor得到驱逐计划，并标记好相应块后，直接调用BlockMetadataManagerView.clearBlockMarks()方法驱逐之前标记好的块。

LRU
算法介绍：最近最久未使用原则，剔除最近最久未使用的块

示意图：

核心源码：LRUEvictor.java

/**
 * Implementation of an evictor which follows the least recently used algorithm. It discards the
 * least recently used item based on its access.
 */
@NotThreadSafe
public class LRUEvictor extends AbstractEvictor {
  private static final int LINKED_HASH_MAP_INIT_CAPACITY = 200;
  private static final float LINKED_HASH_MAP_INIT_LOAD_FACTOR = 0.75f;
  private static final boolean LINKED_HASH_MAP_ACCESS_ORDERED = true;
  private static final boolean UNUSED_MAP_VALUE = true;

  /**
   * Access-ordered {@link java.util.LinkedHashMap} from blockId to {@link #UNUSED_MAP_VALUE}(just a
   * placeholder to occupy the value), acts as a LRU double linked list where most recently accessed
   * element is put at the tail while least recently accessed element is put at the head.
   */
  protected Map<Long, Boolean> mLRUCache =
      Collections.synchronizedMap(new LinkedHashMap<Long, Boolean>(LINKED_HASH_MAP_INIT_CAPACITY,
          LINKED_HASH_MAP_INIT_LOAD_FACTOR, LINKED_HASH_MAP_ACCESS_ORDERED));

  /**
   * Creates a new instance of {@link LRUEvictor}.
   *
   * @param view a view of block metadata information
   * @param allocator an allocation policy
   */
  public LRUEvictor(BlockMetadataManagerView view, Allocator allocator) {
    super(view, allocator);

    // preload existing blocks loaded by StorageDir to Evictor
    for (StorageTierView tierView : mManagerView.getTierViews()) {
      for (StorageDirView dirView : tierView.getDirViews()) {
        for (BlockMeta blockMeta : dirView.getEvictableBlocks()) { // all blocks with initial view
          mLRUCache.put(blockMeta.getBlockId(), UNUSED_MAP_VALUE);
        }
      }
    }
  }
  
}

总结上面的源码：
1.关键数据结构：Map<Long, Boolean> mLRUCache = Collections.synchronizedMap(new LinkedHashMap…
这是一个在HashMap的基础上多了排序功能的Map，所以就像上面注释中描述的那样，最近最多访问的块排在链尾，最近最少访问的块排在链首

GreedyEvictor
算法介绍：xxx算法

示意图：

核心源码：GreedyEvictor.java

...

LRFU
算法介绍：是LRU和LFU(最近最少使用原则，即根据频率进行排序)两种算法的折中算法

示意图：
在这里插入图片描述

关于LRFU详细推理，感兴趣的同学谷歌一下LRFU论文的推理吧。

核心源码：LRFUEvictor.java

/**
 * This class is used to evict blocks by LRFU. LRFU evict blocks with minimum CRF, where CRF of a
 * block is the sum of F(t) = pow(1.0 / {@link #mAttenuationFactor}, t * {@link #mStepFactor}).
 * Each access to a block has a F(t) value and t is the time interval since that access to current.
 * As the formula of F(t) shows, when (1.0 / {@link #mStepFactor}) time units passed, F(t) will
 * cut to the (1.0 / {@link #mAttenuationFactor}) of the old value. So {@link #mStepFactor}
 * controls the step and {@link #mAttenuationFactor} controls the attenuation. Actually, LRFU
 * combines LRU and LFU, it evicts blocks with small frequency or large recency. When
 * {@link #mStepFactor} is close to 0, LRFU is close to LFU. Conversely, LRFU is close to LRU
 * when {@link #mStepFactor} is close to 1.
 */
@NotThreadSafe
public final class LRFUEvictor extends AbstractEvictor {
  /** Map from block id to the last updated logic time count. */
  private final Map<Long, Long> mBlockIdToLastUpdateTime = new ConcurrentHashMap<>();
  // Map from block id to the CRF value of the block
  private final Map<Long, Double> mBlockIdToCRFValue = new ConcurrentHashMap<>();
  /** In the range of [0, 1]. Closer to 0, LRFU closer to LFU. Closer to 1, LRFU closer to LRU. */
  private final double mStepFactor;
  /** The attenuation factor is in the range of [2, INF]. */
  private final double mAttenuationFactor;

  /** Logic time count. */
  private AtomicLong mLogicTimeCount = new AtomicLong(0L);

  /**
   * Creates a new instance of {@link LRFUEvictor}.
   *
   * @param view a view of block metadata information
   * @param allocator an allocation policy
   */
  public LRFUEvictor(BlockMetadataManagerView view, Allocator allocator) {
    super(view, allocator);
    mStepFactor = Configuration.getDouble(PropertyKey.WORKER_EVICTOR_LRFU_STEP_FACTOR);
    mAttenuationFactor =
        Configuration.getDouble(PropertyKey.WORKER_EVICTOR_LRFU_ATTENUATION_FACTOR);
    Preconditions.checkArgument(mStepFactor >= 0.0 && mStepFactor <= 1.0,
        "Step factor should be in the range of [0.0, 1.0]");
    Preconditions.checkArgument(mAttenuationFactor >= 2.0,
        "Attenuation factor should be no less than 2.0");

    // Preloading blocks
    for (StorageTierView tier : mManagerView.getTierViews()) {
      for (StorageDirView dir : tier.getDirViews()) {
        for (BlockMeta block : dir.getEvictableBlocks()) {
          mBlockIdToLastUpdateTime.put(block.getBlockId(), 0L);
          mBlockIdToCRFValue.put(block.getBlockId(), 0.0);
        }
      }
    }
  }
  
}