NameNode & DataNode-优快云博客

本文详细介绍了Hadoop分布式文件系统（DFS）中的核心组件——NameNode与DataNode的工作原理及作用，包括它们如何管理目录命名空间、存储数据块以及与其他节点的交互。

NameNode类位于org.apache.hadoop.hdfs.server.namenode包下。

NameNode serves as both directory namespace manager and "inode table" for the Hadoop DFS. There is a single NameNode running in any DFS deployment. (Well, except when there is a second backup/failover NameNode.)

The NameNode controls two critical tables:
1) filename->blocksequence (namespace)
2) block->machinelist ("inodes")

The first table is stored on disk and is very precious. The second table is rebuilt every time the NameNode comes up.

'NameNode' refers to both this class as well as the 'NameNode server'. The 'FSNamesystem' class actually performs most of the filesystem management. The majority of the 'NameNode' class itself is concerned with exposing the IPC interface and the http server to the outside world, plus some configuration management.

NameNode implements the ClientProtocol interface, which allows clients to ask for DFS services. ClientProtocol is not designed for direct use by authors of DFS client code. End-users should instead use the org.apache.nutch.hadoop.fs.FileSystem class.

NameNode also implements the DatanodeProtocol interface, used by DataNode programs that actually store DFS data blocks. These methods are invoked repeatedly and automatically by all the DataNodes in a DFS deployment.

NameNode also implements the NamenodeProtocol interface, used by secondary namenodes or rebalancing processes to get partial namenode's state, for example partial blocksMap etc.

DataNode 类位于org.apache.hadoop.hdfs.server.datanode包下。
DataNode is a class (and program) that stores a set of blocks for a DFS deployment. A single deployment can have one or many DataNodes. Each DataNode communicates regularly with a single NameNode. It also communicates with client code and other DataNodes from time to time.

DataNodes store a series of named blocks. The DataNode allows client code to read these blocks, or to write new block data. The DataNode may also, in response to instructions from its NameNode, delete blocks or copy blocks to/from other DataNodes.

The DataNode maintains just one critical table:
block-> stream of bytes (of BLOCK_SIZE or less)

This info is stored on a local disk. The DataNode reports the table's contents to the NameNode upon startup and every so often afterwards.

DataNodes spend their lives in an endless loop of asking the NameNode for something to do. A NameNode cannot connect to a DataNode directly; a NameNode simply returns values from functions invoked by a DataNode.

DataNodes maintain an open server socket so that client code or other DataNodes can read/write data. The host/port for this server is reported to the NameNode, which then sends that information to clients or other DataNodes that might be interested.

查找工程里的类或者是资源文件：Ctrl + Shift + R。
查找jar包里的类：Ctrl + Shift + T。

转载于:https://www.cnblogs.com/tsiangleo/p/4200789.html