Hadoop MetaData(元数据) 介绍. 和Block File 信息获得

最新推荐文章于 2025-10-23 14:33:29 发布

原创最新推荐文章于 2025-10-23 14:33:29 发布 · 1w 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#hadoop #hdfs

Java 同时被 2 个专栏收录

263 篇文章

订阅专栏

Hadoop

19 篇文章

订阅专栏

本文深入解析了Hadoop元数据管理机制，包括数据存储方式、读写流程及更新删除方法。详细介绍了元数据镜像文件（FSImage）与日志文件（edits）的备份机制，阐述了NameNode的初始化过程与内存镜像数据结构的初始化。重点分析了元数据应用的场景，如格式化时、Hadoop启动时、元数据更新操作时及配合SecondaryNameNode、BackupNode或checkpointNode时的检查点操作。

部署运行你感兴趣的模型镜像

hadoop 管理数据的机制

hadoop 用来存储文件是很好，但是要去对存储好的文件进行update，delete，操作，相对就不是那么好操作了，但是非要

做这样的操作，该如何办呐？

a. 先去看hadoop 存文件是怎么存的，是怎么读|写的，

b.根据a 的结论，找到读|写的方式也就找到了如何去delete，update 的方式了.

hadoop 的数据都是被 namenode 管理在内存和文件系统中的，用hadoop metadta来标记数据存储的位置，名字，目录等信息.

hadoop 写数据的实例是：https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop

hadoop 读数据的实例是：https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs

hadoop 元数据介绍：

http://www.dummies.com/how-to/content/hadoop-distributed-file-system-hdfs-for-big-data-p.html
http://zh.hortonworks.com/blog/hdfs-metadata-directories-explained/

以下是在 hadoop hdfs 上输出某个文件目录下的所有的Block File 的信息

FileSystem myFs = FileSystem.get(“hdfs://localhost:9000”, new Configure());

//get all files in the directory
RemoteIterator<LocatedFileStatus> localFiles = myFs.listFiles(myPath, false);
while (localFiles.hasNext()) {
LocatedFileStatus localFile = (LocatedFileStatus) localFiles.next();
String tagPath = localFile.getPath().toString();
int index = tagPath.lastIndexOf("/");
String blockFileName = tagPath.substring(index+1, tagPath.length());
String blockPath = tagPath.substring(0,index);
System.err.println(blockPath );
System.err.println(blockFileName);
}

如此就可以将所有的Block File 的信息罗列出来，还能知道每个Block的Size 。

分析：hdfs 其实和 windows 的文件系统是一个道理，因为都是文件系统，

理论都是一样的。不管怎么变化，万变不离其宗...

以下为转载整理内容

1、元数据（Metadata）

维护HDFS文件系统中文件和目录的信息，分为内存元数据和元数据文件两种。NameNode维护整个元数据。

HDFS实现时，没有采用定期导出元数据的方法，而是采用元数据镜像文件（FSImage）+日子文件（edits）的备份机制。

2、Block：文件内容而言。

寻路径流程：

路径信息 bocks[] triplets[]

Client ------------》INode---------------------》BlockInfo --------------------------》DataNode。

INode：文件的基本元素：文件和目录

BlockInfo：文件内容对象

DatanodeDescriptor:具体存储对象。

3 、 FSImage和edits的checkPoint。

FSImage有2个状态，分别是FsImage和FsImage.ckpt,后者表示正在checkpoint的过程中，上传后将会修改为FSImage文件，同理edits也有两个状态，edits和edits.new。

4、NameNode format情景分析：

遍历元数据存储目录，提示用户是否格式化？(NameNode.java里format函数）

[java]view plaincopyprint? 
     
 private static boolean format( Configuration conf ,  
                                 boolean isConfirmationNeeded )  
       throws IOException {  
     Collection<URI > dirsToFormat = FSNamesystem. getNamespaceDirs(conf );  
     Collection<URI > editDirsToFormat =  
                  FSNamesystem .getNamespaceEditsDirs (conf );  
     for( Iterator< URI> it = dirsToFormat.iterator (); it. hasNext() ;) {  
       File curDir = new File (it .next (). getPath()) ;  
       if (! curDir. exists())  
         continue;  
       if (isConfirmationNeeded ) {  
         System .err .print ("Re-format filesystem in " + curDir + " ? (Y or N) ");  
         if (! (System .in .read () == 'Y')) {  
           System .err .println ("Format aborted in " + curDir );  
           return true ;  
         }  
         while(System .in .read () != '\n') ; // discard the enter-key  
       }  
     }  
   
     FSNamesystem nsys = new FSNamesystem (new FSImage(dirsToFormat ,  
                                          editDirsToFormat ), conf) ;  
     nsys.dir.fsImage .format ();  
     return false;  
   }  

创建元数据内存镜像，包括类FSNamesystem实例化对象，类FSDirectory实例化对象，类FSImage对象，类Edits对象。创建FsNameSystem对象主要完成：BlockManager，FSDirectory对象以及初始化成员变量。FSImage对象主要完成对layoutVersion、namespaceID，CTime赋值为0，实例化FSEditLog。在类FSDirectory，创建了HDFS根目录节点rootDir。

[java]view plaincopyprint? 
     
 FSNamesystem( FSImage fsImage, Configuration conf ) throws IOException {  
     this. blockManager = new BlockManager (this, conf) ;  
     setConfigurationParameters (conf );  
     this. dir = new FSDirectory(fsImage , this, conf );  
     dtSecretManager = createDelegationTokenSecretManager (conf );  
   }  
   
   FSImage( Collection< URI> fsDirs , Collection< URI> fsEditsDirs )  
       throws IOException {  
     this() ;  
     setStorageDirectories( fsDirs, fsEditsDirs );  
   }  
   
  void setStorageDirectories(Collection <URI > fsNameDirs,  
                              Collection< URI> fsEditsDirs ) throws IOException {  
     this. storageDirs = new ArrayList <StorageDirectory >() ;  
     this. removedStorageDirs = new ArrayList <StorageDirectory >() ;  
      
    // Add all name dirs with appropriate NameNodeDirType  
     for (URI dirName : fsNameDirs ) {  
       checkSchemeConsistency (dirName );  
       boolean isAlsoEdits = false;  
       for (URI editsDirName : fsEditsDirs) {  
         if (editsDirName .compareTo (dirName ) == 0) {  
           isAlsoEdits = true;  
           fsEditsDirs .remove (editsDirName );  
           break;  
         }  
       }  
       NameNodeDirType dirType = (isAlsoEdits ) ?  
                           NameNodeDirType .IMAGE_AND_EDITS :  
                           NameNodeDirType .IMAGE ;  
       // Add to the list of storage directories, only if the  
       // URI is of type file://  
       if(dirName .getScheme (). compareTo( JournalType.FILE .name (). toLowerCase())  
           == 0){  
         this.addStorageDir (new StorageDirectory(new File(dirName. getPath()) ,  
             dirType ));  
       }  
     }  
      
     // Add edits dirs if they are different from name dirs  
     for (URI dirName : fsEditsDirs ) {  
       checkSchemeConsistency (dirName );  
       // Add to the list of storage directories, only if the  
       // URI is of type file://  
       if(dirName .getScheme (). compareTo( JournalType.FILE .name (). toLowerCase())  
           == 0)  
         this.addStorageDir (new StorageDirectory(new File(dirName. getPath()) ,  
                     NameNodeDirType .EDITS ));  
     }  
   }  

对内存镜像数据中的数据结构进行初始化：主要有FSImage的format函数完成，layoutVersion：软件所处的版本。namespaceID：在Format时候产生，当data node注册到Name Node后，会获得该NameNode的NameSpaceID，并作为后续与NameNode通讯的身份标识。对于未知身份的Data Node，NameNode拒绝通信。CTime：表示FSimage产生的时间。checkpointTime：表示NameSpace第一次checkpoint的时间。

[java]view plaincopyprint? 
     
 public void format () throws IOException {  
    this. layoutVersion = FSConstants .LAYOUT_VERSION ;  
    this. namespaceID = newNamespaceID ();  
    this. cTime = 0L ;  
    this. checkpointTime = FSNamesystem .now ();  
    for (Iterator <StorageDirectory > it =  
                           dirIterator (); it. hasNext() ;) {  
      StorageDirectory sd = it .next ();  
      format (sd );  
    }  
  }  

对内存镜像写入元数据备份目录。FSImage的format方法会遍历所有的目录进行备份。如果是FSImage的文件目录，则调用saveFSImage保存FSImage，如果是Edits，则调用editLog.createEditLogFile,最后调用sd.write方法创建fstime和VERSION文件。VERSION文件通常最后写入。

[java]view plaincopyprint? 
     
 void format(StorageDirectory sd ) throws IOException {  
     sd.clearDirectory (); // create currrent dir  
     sd.lock ();  
     try {  
       saveCurrent (sd );  
     } finally {  
       sd .unlock ();  
     }  
     LOG.info ("Storage directory " + sd. getRoot()  
              + " has been successfully formatted.");  
   }