master-slave architecture (same as Google File System)
One NameNode (metadata) + N DataNodes (actual data)
Emphasis: high throughput, not low latency.
Simple Coherency Model: write-once-read-many. (Map/Reduce application or web crawler application fits perfectly) Support appending-writes in the future.
“Moving Computation is Cheaper than moving data”HDFS provides interfaces for applications to move themselves closer to where the data is located. ( how?)
Data Replication: When replication factor is 3: one replica on node 1 in rack A + one replica on node 2 in rack A + one replica on node 3 in rack B. ( Improve wirte performance without compromising data reliability or read performance.)
In all, HDFS is similiar with GFS, simple designed but with huge scalability. Comparing with the knowledge I have on DFS, I think engineering will create sth useful and simple while research makes it complex and impractical.