NameNode & DataNode

本文详细介绍了Hadoop分布式文件系统(DFS)中的核心组件——NameNode与DataNode的工作原理及作用,包括它们如何管理目录命名空间、存储数据块以及与其他节点的交互。

NameNode类位于org.apache.hadoop.hdfs.server.namenode包下。

NameNode serves as both directory namespace manager and "inode table" for the Hadoop DFS. There is a single NameNode running in any DFS deployment. (Well, except when there is a second backup/failover NameNode.)

The NameNode controls two critical tables:
1) filename->blocksequence (namespace)
2) block->machinelist ("inodes")

The first table is stored on disk and is very precious. The second table is rebuilt every time the NameNode comes up.

'NameNode' refers to both this class as well as the 'NameNode server'. The 'FSNamesystem' class actually performs most of the filesystem management. The majority of the 'NameNode' class itself is concerned with exposing the IPC interface and the http server to the outside world, plus some configuration management.

NameNode implements the ClientProtocol interface, which allows clients to ask for DFS services. ClientProtocol is not designed for direct use by authors of DFS client code. End-users should instead use the org.apache.nutch.hadoop.fs.FileSystem class.

NameNode also implements the DatanodeProtocol interface, used by DataNode programs that actually store DFS data blocks. These methods are invoked repeatedly and automatically by all the DataNodes in a DFS deployment.

NameNode also implements the NamenodeProtocol interface, used by secondary namenodes or rebalancing processes to get partial namenode's state, for example partial blocksMap etc.

 

 

DataNode 类位于org.apache.hadoop.hdfs.server.datanode包下。
DataNode is a class (and program) that stores a set of blocks for a DFS deployment. A single deployment can have one or many DataNodes. Each DataNode communicates regularly with a single NameNode. It also communicates with client code and other DataNodes from time to time.

DataNodes store a series of named blocks. The DataNode allows client code to read these blocks, or to write new block data. The DataNode may also, in response to instructions from its NameNode, delete blocks or copy blocks to/from other DataNodes.

The DataNode maintains just one critical table:
block-> stream of bytes (of BLOCK_SIZE or less)

This info is stored on a local disk. The DataNode reports the table's contents to the NameNode upon startup and every so often afterwards.

DataNodes spend their lives in an endless loop of asking the NameNode for something to do. A NameNode cannot connect to a DataNode directly; a NameNode simply returns values from functions invoked by a DataNode.

DataNodes maintain an open server socket so that client code or other DataNodes can read/write data. The host/port for this server is reported to the NameNode, which then sends that information to clients or other DataNodes that might be interested.

 

查找工程里的类或者是资源文件:Ctrl + Shift + R。
查找jar包里的类:Ctrl + Shift + T。

 

转载于:https://www.cnblogs.com/tsiangleo/p/4200789.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值