白话HBase-RegionServer如何判断Region本地化百分比的

最新推荐文章于 2022-09-15 20:02:06 发布

年更yao

最新推荐文章于 2022-09-15 20:02:06 发布

阅读量2.1k

点赞数

CC 4.0 BY-SA版权

分类专栏： # HBase基础文章标签： hbase hadoop java 大数据

本文链接：https://blog.youkuaiyun.com/Gloria_y/article/details/85099026

HBase基础专栏收录该内容

43 篇文章

订阅专栏

本文详细解析了HBase中数据本地化百分比的计算原理，从HRegionServer的心跳信息出发，深入源码，展示了如何通过HDFS的Block分布信息计算出数据在当前服务器上的本地化程度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

说明

本地化百分比=这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小

入手

已知HRegionServer中心跳汇报给HMaster的信息中，有数据本地化百分比的指标RegionLoad中，我们需要看看RegionLoad这个数据是怎么产生&计算的

开始源码

1.HRegionServer这个方法是定时被调用构造最新RegionLoad的方法

–>这个RegionLoad内容会被心跳发送给HMaster

1 2	`private` `RegionLoad createRegionLoad(final` `HRegion r, RegionLoad.Builder regionLoadBldr,` `RegionSpecifier.Builder regionSpecifier)`

2.其中dataLocality变量是region的本地化百分比值

float dataLocality =
    r.getHDFSBlocksDistribution().getBlockLocalityIndex(serverName.getHostname());

那么下面就看看

getHDFSBlocksDistribution  返回  HDFSBlocksDistribution类型

和

getBlockLocalityIndex

3.HDFSBlocksDistribution是啥

两个成员变量

private Map<String,HostAndWeight> hostAndWeights = null;  key：hostname  value：HostAndWeight（private String host;private long weight;）//hostname对应的文件大小

private long uniqueBlocksTotalWeight = 0;

getHDFSBlocksDistribution方法

遍历region下面所有StoreFile 得到每个storefile的getHDFSBlockDistribution放到返回的HDFSBlocksDistribution中

/**

* This function will return the HDFS blocks distribution based on the data

* captured when HFile is created

* @return The HDFS blocks distribution for the region.

*/

public HDFSBlocksDistribution getHDFSBlocksDistribution() {

HDFSBlocksDistribution hdfsBlocksDistribution =

new HDFSBlocksDistribution();

synchronized (this.stores) {

//遍历所有StoreFile

for (Store store : this.stores.values()) {

for (StoreFile sf : store.getStorefiles()) {

HDFSBlocksDistribution storeFileBlocksDistribution =

sf.getHDFSBlockDistribution();

hdfsBlocksDistribution.add(storeFileBlocksDistribution);

}

return hdfsBlocksDistribution;

}

4.StoreFile 的getHDFSBlockDistribution方法（HRegionServer–>HRegion-->StoreFile）

//StoreFile 的getHDFSBlockDistribution方法

/**

*

* @return the cached value of HDFS blocks distribution. The cached value is

* calculated when store file is opened.

*/

public HDFSBlocksDistribution getHDFSBlockDistribution() {

return this.fileInfo.getHDFSBlockDistribution();

}

5.实际使用的是(StoreFileInfo)fileInfo.getHDFSBlockDistribution的信息

/** @return the HDFS block distribution */

public HDFSBlocksDistribution getHDFSBlockDistribution() {

return this.hdfsBlocksDistribution;

}

思考：

fileInfo的hdfsBlocksDistribution在哪里赋值的？

IDE里面在对应变量上面右键选择Find Usages（看哪里被用到了）

Value Read是读取不用看了

看Value Write 在哪里被赋值了,原来是StoreFileInfo的open方法！

if (this.reference != null) { //reference是split产生的，hfile-link是snapshot产生的

hdfsBlocksDistribution = computeRefFileHDFSBlockDistribution(fs, reference, status);

} else {

//实际数据走这里

hdfsBlocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0, length);

}

6.hdfsBlocksDistribution=FSUtils.computeHDFSBlocksDistribution

/**

* Compute HDFS blocks distribution of a given file, or a portion of the file

* @param fs file system

* @param status file status of the file

* @param start start position of the portion

* @param length length of the portion

* @return The HDFS blocks distribution

*/

static public HDFSBlocksDistribution computeHDFSBlocksDistribution(

final FileSystem fs, FileStatus status, long start, long length)

throws IOException {

HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution();

//fs.getFileBlockLocations是HDFS的api，获取文件对应的block块位置信息

BlockLocation [] blockLocations =

fs.getFileBlockLocations(status, start, length);

//遍历文件下所有block，获取block对应的N个副本位置——hosts数组

for(BlockLocation bl : blockLocations) {

String [] hosts = bl.getHosts();

long len = bl.getLength();

//addHostsAndBlockWeight方法中将副本位置和数据大小放到unique weight和hostAndWeight中

blocksDistribution.addHostsAndBlockWeight(hosts, len);

}

return blocksDistribution;

}

7.addHostsAndBlockWeight

上面的blocksDistribution.addHostsAndBlockWeight(hosts, len);具体实现

/**

* add some weight to a list of hosts, update the value of unique block weight

* @param hosts the list of the host

* @param weight the weight

*/

public void addHostsAndBlockWeight(String[] hosts, long weight) {

if (hosts == null || hosts.length == 0) {

// erroneous data

return;

}

addUniqueWeight(weight);

for (String hostname : hosts) {

addHostAndBlockWeight(hostname, weight);

}

说明

uniqueBlocksTotalWeight是文件的逻辑大小

Map<String,HostAndWeight> hostAndWeights 是每个host上这个region对应的文件逻辑大小

（代码中其实遍历了三副本的hosts，但是因为这个Map的key是host，所以hostAndWeights中每个hostname下面某一文件只会有一份block（HDFS的三副本策略））

8.最后的getBlockLocalityIndex

截止到这里，只看了RS中调用的r.getHDFSBlocksDistribution().getBlockLocalityIndex 前半部分r.getHDFSBlocksDistribution

后面的getBlockLocalityIndex(serverName.getHostname());做了啥计算？

/**

* return the locality index of a given host

* @param host the host name

* @return the locality index of the given host

*/

public float getBlockLocalityIndex(String host) {

float localityIndex = 0;

//获取这个region当前这个Server上面的HostAndWeight

HostAndWeight hostAndWeight = this.hostAndWeights.get(host);

if (hostAndWeight != null && uniqueBlocksTotalWeight != 0) {

//用这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小

localityIndex=(float)hostAndWeight.weight/(float)uniqueBlocksTotalWeight;

}

return localityIndex;

}

拓展

HDFS对HBase的影响至关重要，有精力可以多关注HDFS的api和特性

HDFS的BlockLocation都存了啥

/**

* Represents the network location of a block, information about the hosts

* that contain block replicas, and other block metadata (E.g. the file

* offset associated with the block, length, whether it is corrupt, etc).

*/

@InterfaceAudience.Public

@InterfaceStability.Stable

public class BlockLocation {

private String[] hosts; // Datanode hostnames

private String[] cachedHosts; // Datanode hostnames with a cached replica

private String[] names; // Datanode IP:xferPort for accessing the block

private String[] topologyPaths; // Full path name in network topology

private long offset; // Offset of the block in the file

private long length;

private boolean corrupt;