spark 2.1 spark executor topology information

本文介绍了Apache Spark中用于块复制的拓扑映射机制。详细解析了DefaultTopologyMapper和FileBasedTopologyMapper的工作原理,以及如何通过配置选择不同的拓扑映射器来获取节点的拓扑信息。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

BlockManagerMasterEndpoint return block manager id with topology information.
In register method, calls topologyMapper.getTopologyForHost(idWithoutTopologyInfo.host)

val id = BlockManagerId(
      idWithoutTopologyInfo.executorId,
      idWithoutTopologyInfo.host,
      idWithoutTopologyInfo.port,
      topologyMapper.getTopologyForHost(idWithoutTopologyInfo.host))

The defaule class of topologyMapper is DefaultTopologyMapper.

  private val topologyMapper = {
    val topologyMapperClassName = conf.get(
      "spark.storage.replication.topologyMapper", classOf[DefaultTopologyMapper].getName)
    val clazz = Utils.classForName(topologyMapperClassName)
    val mapper =
      clazz.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[TopologyMapper]
    logInfo(s"Using $topologyMapperClassName for getting topology information")
    mapper
  }

DefaultTopologyMapper. It assumes all nodes are in the same rack

@DeveloperApi
class DefaultTopologyMapper(conf: SparkConf) extends TopologyMapper(conf) with Logging {
  override def getTopologyForHost(hostname: String): Option[String] = {
    logDebug(s"Got a request for $hostname")
    None
  }
}

TopologyMapper

/**
 * ::DeveloperApi::
 * TopologyMapper provides topology information for a given host
 * @param conf SparkConf to get required properties, if needed
 */
@DeveloperApi
abstract class TopologyMapper(conf: SparkConf) {
  /**
   * Gets the topology information given the host name
   *
   * @param hostname Hostname
   * @return topology information for the given hostname. One can use a 'topology delimiter'
   *         to make this topology information nested.
   *         For example : ‘/myrack/myhost’, where ‘/’ is the topology delimiter,
   *         ‘myrack’ is the topology identifier, and ‘myhost’ is the individual host.
   *         This function only returns the topology information without the hostname.
   *         This information can be used when choosing executors for block replication
   *         to discern executors from a different rack than a candidate executor, for example.
   *
   *         An implementation can choose to use empty strings or None in case topology info
   *         is not available. This would imply that all such executors belong to the same rack.
   */
  def getTopologyForHost(hostname: String): Option[String]
}

FileBasedTopologyMapper

/**
 * A simple file based topology mapper. This expects topology information provided as a
 * [[java.util.Properties]] file. The name of the file is obtained from SparkConf property
 * `spark.storage.replication.topologyFile`. To use this topology mapper, set the
 * `spark.storage.replication.topologyMapper` property to
 * [[org.apache.spark.storage.FileBasedTopologyMapper]]
 * @param conf SparkConf object
 */
@DeveloperApi
class FileBasedTopologyMapper(conf: SparkConf) extends TopologyMapper(conf) with Logging {
  val topologyFile = conf.getOption("spark.storage.replication.topologyFile")
  require(topologyFile.isDefined, "Please specify topology file via " +
    "spark.storage.replication.topologyFile for FileBasedTopologyMapper.")
  val topologyMap = Utils.getPropertiesFromFile(topologyFile.get)

  override def getTopologyForHost(hostname: String): Option[String] = {
    val topology = topologyMap.get(hostname)
    if (topology.isDefined) {
      logDebug(s"$hostname -> ${topology.get}")
    } else {
      logWarning(s"$hostname does not have any topology information")
    }
    topology
  }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值