HDFS

本文探讨了HDFS(Hadoop分布式文件系统)及其与Google文件系统(GFS)相似之处。介绍了HDFS的主从架构特点:一个名称节点负责元数据管理,多个数据节点存储实际数据。强调了其高吞吐量而非低延迟的设计理念,适用于Map/Reduce应用及网络爬虫场景。此外,还讨论了数据复制策略以提高写入性能的同时确保数据可靠性和读取性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

master-slave architecture (same as Google File System)
One NameNode (metadata) + N DataNodes (actual data)hdfsarchitecture

Emphasis: high throughput, not low latency.

Simple Coherency Model: write-once-read-many. (Map/Reduce application or web crawler application fits perfectly) Support appending-writes in the future.

“Moving Computation is Cheaper than moving data”HDFS provides interfaces for applications to move themselves closer to where the data is located. ( how?)

Data Replication: When replication factor is 3: one replica on node 1 in rack A + one replica on node 2 in rack A + one replica on node 3 in rack B. ( Improve wirte performance  without compromising data reliability or read performance.)

In all, HDFS is similiar with GFS, simple designed but with huge scalability. Comparing with the knowledge I have on DFS, I think engineering will create sth useful and simple while research makes it complex and impractical.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值