HDFS

本文深入探讨了分布式文件系统的概念,从简单易用的NFS出发,对比介绍了其局限性,进而引入了更为强大的HDFS。阐述了HDFS的设计原则、优势特性,包括数据分布、副本机制、同步过程以及NameNode和DataNode的角色,旨在为读者提供全面的分布式文件系统知识。
------------------------------------------------------------------------------------
简介
------------------------------------------------------------------------------------
(1)Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications.
------------------------------------------------------------------------------------
Distributed File System Basics
------------------------------------------------------------------------------------
(2)NFS
While its design is straightforward, it is also very constrained. NFS provides remote access to a single logical volume stored on a single machine. An NFS server makes a portion of its local file system visible to external clients. The clients can then mount this remote file system directly into their own Linux file system, and interact with it as though it were part of the local drive.
One of the primary advantages of this model is its transparency.
缺点:(a)The files in an NFS volume all reside on a single machine
             (b)  all the clients must go to this machine to retrieve their data. This can overload the server if a large number of clients must be handled
------------------------------------------------------------------------------------
(3)HDFS
  (a)HDFS is designed to store a very large amount of information (terabytes or petabytes). This requires spreading the data across a large number of machines. It also supports much larger file sizes than NFS.
  (b)HDFS should store data reliably. If individual machines in the cluster malfunction, data should still be available.
  (c)HDFS should provide fast, scalable access to this information. It should be possible to serve a larger number of clients by simply adding more machines to the cluster.
  (d)HDFS should integrate well with Hadoop MapReduce, allowing data to be read and computed upon locally when possible.
(4) HDFS三特征: 分片(block64MB), 副本(3 replica),同步
(5) NameNode 存元数据,主要放在主存中, DataNode存数据片
(6) NameNode的可靠性非常重要,通过冗余来完成
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值