hadoop 2.x-HDFS federation

本文聚焦于Hadoop 2.5.1中的HDFSFederation,详细阐述了其抽象、实现及与其他分布式命名空间的可扩展性。通过引入多命名空间解决单一命名节点的问题,实现多级命名空间管理,提升系统的可用性和水平扩展能力。文中还介绍了如何通过配置文件实现客户端全局视图,以及如何通过添加备用命名节点增强系统的高可用性。此外,还讨论了HDFS与其他分布式命名空间的比较,如动态子树分区和基于哈希的分区,并强调了多数据中心共享的重要性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 note,this article is focus on hadoop-2.5.1,it maybe a little different from hadoop-0.23.x.

Agenda:

 I:hdfs federation abstract

 II:implement

 III:other distributed namespaces scalability

 -------------------

I:hdfs federation abstract

  in hadoop-1.x or before(except 0.23.x),there is a couple of limitations on a namenode:

  1.single point of failure(spof,addressed by HA)

  2.too many files/blocks,causes the metadata size is out of namenode's memory usage

  3.single master(namenode),undertakes many tasks:meta management,resouce assigment,datanodes' heartbeats,file lcoations etc.so these will reduce the throughput of namenode

  4. only one namespace,no valid seperation/isolation between multi tenant,e.g.a development apps will interfere with the product  apps each other.

   ....

  so,a multi namespaces comes in to address all of above issues (except case 1).with the multi-namespaces,you can say as 'multi hierarchies name management' also,it will benefit from various scalabilities,and hdfs federation is implemented it in hadoop:

feature resolutionscalingabstract
federationscalabilityhorizontalunion multi namespaces from multi clusters
haavailablityhorizontalby adding some standby namenodes

  so if integrates federation with ha ,u will see this architecture:



                 hdfs federation + ha schema

 

 II.implementation

  like some linux mount disks,the it's easy to achive this schema:using Client Mount Table to get a global view of all federation namenodes!so it's simple and no signle-point-of-faiure for clients access it.and only make some changes in properties like this:

core-site.xml

 

<xi:include href=“cmt.xml"/>
<property>
    <name>fs.defaultFS</name>
    <value>viewfs://nsX</value>
    <description> </description>
</property>
   note:here uses authority 'viewfs' instead of hdfs .

and the cmt.xml 

<configuration><property><name>fs.viewfs.mounttable.nsX.link./share</name><value>hdfs://ns1/real_share</value></property><property><name>fs.viewfs.mounttable.nsX.link./user</name><value>hdfs://ns2/real_user</value></property></configuration>

 

  this means that in client the namespace '/share' is mapped to 'ns1' s real dir '/real_share' ,and the '/user' is similar to this.so we must create the real dir first:

hdfs dfs -mkdir hdfs://ns1/real_share

 

hdfs dfs -mkdir hdfs://ns2/real_user

  and for hdfs-site.xml is a little different from HA,see Hadoop 2.0 NameNode HA和Federation实践

 

 III.other distributed namespaces

  1. dynamic subtree patitioning(ceph)

  2.hash-based partitioning(Lustre) 

 

 IV.advantages vs shortcomings

  besides above advangates mentioned at first,the shared by multi-datacenter is very important to use federation;and the shortcomings are clear:

  a.not even data storage among different namespaces maybe occur;

 

ref:

hadoop 2.x-HDFS HA --Part I: abstraction

hortonworks hdfs federation

HDFS scalability with multiple namenodes

Hadoop 2.0 NameNode HA和Federation实践
Scaling HDFS Namenode using Multiple Namespace (Namenodes) and  Block Pools  

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值