HDFS

最新推荐文章于 2025-05-24 14:50:39 发布

原创最新推荐文章于 2025-05-24 14:50:39 发布 · 117 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Rack #performance #Google #Web

本文探讨了HDFS（Hadoop分布式文件系统）及其与Google文件系统(GFS)相似之处。介绍了HDFS的主从架构特点：一个名称节点负责元数据管理，多个数据节点存储实际数据。强调了其高吞吐量而非低延迟的设计理念，适用于Map/Reduce应用及网络爬虫场景。此外，还讨论了数据复制策略以提高写入性能的同时确保数据可靠性和读取性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

master-slave architecture (same as Google File System)
One NameNode (metadata) + N DataNodes (actual data)

Emphasis: high throughput, not low latency.

Simple Coherency Model: write-once-read-many. (Map/Reduce application or web crawler application fits perfectly) Support appending-writes in the future.

“Moving Computation is Cheaper than moving data”HDFS provides interfaces for applications to move themselves closer to where the data is located. ( how?)

Data Replication: When replication factor is 3: one replica on node 1 in rack A + one replica on node 2 in rack A + one replica on node 3 in rack B. ( Improve wirte performance without compromising data reliability or read performance.)

In all, HDFS is similiar with GFS, simple designed but with huge scalability. Comparing with the knowledge I have on DFS, I think engineering will create sth useful and simple while research makes it complex and impractical.