Hadoop: Rack Awareness Topology

Hadoop通过Rack Awareness机制确保数据块被复制到不同机架上的节点以防止单一故障点导致的数据丢失。此外,该机制还考虑了网络带宽和延迟,优化数据流在网络中的分布,提高整体性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Quoted from http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/ 

 

Hadoop has the concept of “Rack Awareness”.  Hadoop administrator can manually define the rack number of each slave Data Node in your cluster.  Why would you go through the trouble of doing this?  There are two key reasons for this: Data loss prevention, and network performance.  Remember that each block of data will be replicated to multiple machines to prevent the failure of one machine from losing all copies of data.  Wouldn’t it be unfortunate if all copies of data happened to be located on machines in the same rack, and that rack experiences a failure? Such as a switch failure or power failure.  That would be a mess.  So to avoid this, somebody needs to know where Data Nodes are located in the network topology and use that information to make an intelligent decision about where data replicas should exist in the cluster.  That “somebody” is the Name Node.

 

There is also an assumption that two machines in the same rack have more bandwidth and lower latency between each other than two machines in two different racks.  This is true most of the time.  The rack switch uplink bandwidth is usually (but not always) less than its downlink bandwidth.  Furthermore, in-rack latency is usually lower than cross-rack latency (but not always).  If at least one of those two basic assumptions are true, wouldn’t it be cool if Hadoop can use the same Rack Awareness that protects data to also optimally place work streams in the cluster, improving network performance?  Well, it does!

 

Quoted from http://developer.yahoo.com/hadoop/tutorial/module2.html#rack

Rack Awareness

For small clusters in which all servers are connected by a single switch, there are only two levels of locality: "on-machine" and "off-machine." When loading data from a DataNode's local drive into HDFS, the NameNode will schedule one copy to go into the local DataNode, and will pick two other machines at random from the cluster.

For larger Hadoop installations which span multiple racks, it is important to ensure that replicas of data exist on multiple racks. This way, the loss of a switch does not render portions of the data unavailable due to all replicas being underneath it.

HDFS can be made rack-aware by the use of a script which allows the master node to map the network topology of the cluster. While alternate configuration strategies can be used, the default implementation allows you to provide an executable script which returns the "rack address" of each of a list of IP addresses.

The network topology script receives as arguments one or more IP addresses of nodes in the cluster. It returns on stdout a list of rack names, one for each input. The input and output order must be consistent.

To set the rack mapping script, specify the key topology.script.file.name in conf/hadoop-site.xml. This provides a command to run to return a rack id; it must be an executable script or program. By default, Hadoop will attempt to send a set of IP addresses to the file as several separate command line arguments. You can control the maximum acceptable number of arguments with the topology.script.number.args key.

Rack ids in Hadoop are hierarchical and look like path names. By default, every node has a rack id of /default-rack. You can set rack ids for nodes to any arbitrary path, e.g., /foo/bar-rack. Path elements further to the left are higher up the tree. Thus a reasonable structure for a large installation may be /top-switch-name/rack-name.

Hadoop rack ids are not currently expressive enough to handle an unusual routing topology such as a 3-d torus; they assume that each node is connected to a single switch which in turn has a single upstream switch. This is not usually a problem, however. Actual packet routing will be directed using the topology discovered by or set in switches and routers. The Hadoop rack ids will be used to find "near" and "far" nodes for replica placement (and in 0.17, MapReduce task placement).

The following example script performs rack identification based on IP addresses given a hierarchical IP addressing scheme enforced by the network administrator. This may work directly for simple installations; more complex network configurations may require a file- or table-based lookup process. Care should be taken in that case to keep the table up-to-date as nodes are physically relocated, etc. This script requires that the maximum number of arguments be set to 1.

#!/bin/bash
# Set rack id based on IP address.
# Assumes network administrator has complete control
# over IP addresses assigned to nodes and they are
# in the 10.x.y.z address space. Assumes that
# IP addresses are distributed hierarchically. e.g.,
# 10.1.y.z is one data center segment and 10.2.y.z is another;
# 10.1.1.z is one rack, 10.1.2.z is another rack in
# the same segment, etc.)
#
# This is invoked with an IP address as its only argument

# get IP address from the input
ipaddr=$0

# select "x.y" and convert it to "x/y"
segments=`echo $ipaddr | cut --delimiter=. --fields=2-3 --output-delimiter=/`
echo /${segments}

 

 

Topology Scripts sample from Hadoop Wiki.

 

Topology scripts are used by hadoop to determine the rack location of nodes. This information is used by hadoop to replicate block data to redundant racks. Here is a sample script that uses a separate data file. You can specified  the rack mapping script via the key topology.script.file.name in conf/hadoop-site.xml, it must be an executable script or program.

Topology Script

HADOOP_CONF=/etc/hadoop/conf 

while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data 
  result="" 
  while read line ; do
    ar=( $line ) 
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done 
  shift 
  if [ -z "$result" ] ; then
    echo -n "/default/rack "
  else
    echo -n "$result "
  fi
done 

 

Data file topology.data used by above topology script.

hadoopdata1.ec.com     /dc1/rack1
hadoopdata1            /dc1/rack1
10.1.1.1               /dc1/rack2

 

OpenFlow

Even more interesting would be a OpenFlow network, where the Name Node could query the OpenFlow controller about a Node’s location in the topology. Refer to http://bradhedlund.com/2011/04/21/data-center-scale-openflow-sdn/

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值