Replica Placement in DFS

最新推荐文章于 2025-05-01 23:03:40 发布

原创最新推荐文章于 2025-05-01 23:03:40 发布 · 645 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#hadoop #random #network #rack #algorithm #磁盘

Distributed and Parallel 专栏收录该内容

31 篇文章

订阅专栏

本文深入探讨了Hadoop中名称节点如何选择数据节点存储副本，以平衡可靠性和带宽需求。通过放置策略分析，解释了如何实现双网架冗余、减少写带宽消耗并确保读取性能。

Replica Placement in DFS

Hadoop:
How does the namenode choose which datanodes to store replicas on? There’s a tradeoff between reliability and write bandwidth and read bandwidth here.

For example, placing all replicas on a single node incurs the lowest write bandwidth penalty since the replication pipeline runs on a single node, but this offers no real redundancy (if the node fails, the data for that block is lost). Also, the read bandwidth is high for off-rack reads. At the other extreme, placing replicas in different data centers may maximize redundancy, but at the cost of bandwidth. Even in the same data center (which is what all Hadoop clusters to date have run in), there are a variety of placement strategies.
Indeed, Hadoop changed its placement strategy in release 0.17.0 to one that helps keep a fairly even distribution of blocks across the cluster. (See “balancer” on page 284 for details on keeping a cluster balanced.)
Hadoop’s strategy is to place the first replica on the same node as the client (for clients running outside the cluster, a node is chosen at random, although the system tries not to pick nodes that are too full or too busy). The second replica is placed on a different rack from the first (off-rack), chosen at random.The third replica is placed on the same rack as the second, but on a different node chosen at random. Further replicas are placed on random nodes on the cluster, although the system tries to avoid placing too many replicas on the same rack.
Once the replica locations have been chosen, a pipeline is built, taking network topology into account. For a replication factor of 3, the pipeline might look like Figure 3-4.
Overall, this strategy gives a good balance between reliability (blocks are stored on two racks), write bandwidth (writes only have to traverse a single network switch), read
performance (there’s a choice of two racks to read from), and block distribution across the cluster (clients only write a single block on the local rack).