HDFS源码解析（二）

最新推荐文章于 2024-06-26 15:17:37 发布

老乔家大哥

最新推荐文章于 2024-06-26 15:17:37 发布

阅读量946

点赞数

CC 4.0 BY-SA版权

分类专栏：大数据 spark hadoop 数据挖掘 storm 文章标签： ha hadoop hdfs yarn 源码

本文链接：https://blog.youkuaiyun.com/QIAOtinger/article/details/41775137

大数据同时被 3 个专栏收录

21 篇文章

订阅专栏

hadoop

17 篇文章

订阅专栏

spark

15 篇文章

订阅专栏

上一篇讲到了namenode的格式化，格式化方法中有

FSImage fsImage = new FSImage(conf, nameDirsToFormat, editDirsToFormat);
    try {
      FSNamesystem fsn = new FSNamesystem(conf, fsImage);

今天主要讲讲FSImage ，FSNamesystem 分别在（1），（2）中

（1）先来看FSImage，FSImage处理checkpointing（检查点），并记录到文件命名空间编辑日志中。

fsimage在磁盘上对应上一篇文章提到的/home/hadoop/dfs/name路径。目录下有current，image，in_use.lock；在current目录下有edits日志，fsimage内存镜像，fstime镜像时间，VERSION版本信息。

FSImage常用操作有loadFSImage（加载文件系统镜像），saveFSImage（保存文件系统镜像）

在loadFSImage中，最终会调用FSImageFormat类中的load(File curFile)方法，代码如下：

public void load(File curFile) throws IOException {
      checkNotLoaded(); // 保证是第一次加载时执行下面的语句
      assert curFile != null : "curFile is null"; // 断言

      StartupProgress prog = NameNode.getStartupProgress(); // 获取启动进度
      Step step = new Step(StepType.INODES);
      prog.beginStep(Phase.LOADING_FSIMAGE, step);
      long startTime = now(); // 开始

      //
      // Load in bits
      //
      MessageDigest digester = MD5Hash.getDigester();
      DigestInputStream fin = new DigestInputStream(
           new FileInputStream(curFile), digester); // 获取输入流

      DataInputStream in = new DataInputStream(fin); // 包装输入流
      try {
        // read image version: first appeared in version -1
        int imgVersion = in.readInt(); // 读取镜像版本号
        if (getLayoutVersion() != imgVersion) { // 判断版本是否一致，不一致抛异常
          throw new InconsistentFSStateException(curFile, 
              "imgVersion " + imgVersion +
              " expected to be " + getLayoutVersion());
        }
        boolean supportSnapshot = NameNodeLayoutVersion.supports( // 判断是否支持快照
            LayoutVersion.Feature.SNAPSHOT, imgVersion);
        if (NameNodeLayoutVersion.supports(
            LayoutVersion.Feature.ADD_LAYOUT_FLAGS, imgVersion)) {
          LayoutFlags.read(in);
        }

        // read namespaceID: first appeared in version -2
        in.readInt(); // 读取命名空间编号

        long numFiles = in.readLong(); // 文件数量
<span style="white-space:pre">	</span>......

在saveFSImage中，最终调用FSImageFormatProtobuf中save(File file, FSImageCompression compression)方法，代码如下

void save(File file, FSImageCompression compression) throws IOException {
      FileOutputStream fout = new FileOutputStream(file); // 创建输出流
      fileChannel = fout.getChannel(); // 获取网络套接字的通道，用过java nio的朋友应该清楚
      try {
        saveInternal(fout, compression, file.getAbsolutePath().toString()); // 在该方法中，underlyingOutputStream.write(FSImageUtil.MAGIC_HEADER)进行持久化操作
      } finally {
        fout.close();
      }
    }

（2）对于hadoop集群，master节点存储3种类型元数据：文件和数据块的命名空间，文件和数据块的对应关系，每个数据块副本的存放地点。所有的元数据都保存在内存中，前两种类型也会以记录变更日志的方式记录在系统日志文件中。

文件系统的存储和管理都交给了FSNameSystem类，我们就看看他的注释：

/***************************************************
 * FSNamesystem does the actual bookkeeping work for the // 此类为datanode做实际的簿记工作
 * DataNode.
 *
 * It tracks several important tables.
 *
 * 1)  valid fsname --> blocklist  (kept on disk, logged) // 文件系统命名空间到数据块列表的映射，保存在磁盘上并记录日志
 * 2)  Set of all valid blocks (inverted #1) // 合法数据块集合，上面的逆关系
 * 3)  block --> machinelist (kept in memory, rebuilt dynamically from reports) // 数据块到datanode的映射，保存在内存中，由datanode上报动态重建
 * 4)  machine --> blocklist (inverted #2) // datanode上保存的数据块列表，上面的逆关系
 * 5)  LRU cache of updated-heartbeat machines 近期最少使用缓存队列，保存datanode的心跳信息
 ***************************************************/

FSNamesystem 有一个FSDirectory成员变量，它保存文件名到数据块列表的映射，类中有添加文命名空间，添加文件，添加数据块，创建目录等操作。

下面是数据块相关的方法

  @Override // FSNamesystemMBean
  @Metric
  public long getPendingReplicationBlocks() { // 返回正在复制的数据块
    return blockManager.getPendingReplicationBlocksCount();
  }

  @Override // FSNamesystemMBean
  @Metric
  public long getUnderReplicatedBlocks() { // 返回需要复制的数据块
    return blockManager.getUnderReplicatedBlocksCount();
  }

  /** Returns number of blocks with corrupt replicas */
  @Metric({"CorruptBlocks", "Number of blocks with corrupt replicas"})
  public long getCorruptReplicaBlocks() { // 返回损坏的数据块
    return blockManager.getCorruptReplicaBlocksCount();
  }

  @Override // FSNamesystemMBean
  @Metric
  public long getScheduledReplicationBlocks() { // 返回当前正在处理的数据块复制数目
    return blockManager.getScheduledReplicationBlocksCount();
  }

  @Override
  @Metric
  public long getPendingDeletionBlocks() { // 返回正在删除的数据块数目
    return blockManager.getPendingDeletionBlocksCount();
  }

  @Metric
  public long getExcessBlocks() { // 返回超过配额的数据块数目
    return blockManager.getExcessBlocksCount();
  }
  
  // HA-only metric
  @Metric
  public long getPostponedMisreplicatedBlocks() { // 返回延期或错过复制的数据块数目，仅在ha的情况下
    return blockManager.getPostponedMisreplicatedBlocksCount();
  }