说明
本地化百分比=这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小
入手
已知HRegionServer中心跳汇报给HMaster的信息中,有数据本地化百分比的指标RegionLoad中,我们需要看看RegionLoad这个数据是怎么产生&计算的
开始源码
1.HRegionServer这个方法是定时被调用构造最新RegionLoad的方法
–>这个RegionLoad内容会被心跳发送给HMaster
1
2
|
private RegionLoad createRegionLoad( final HRegion r, RegionLoad.Builder regionLoadBldr,
RegionSpecifier.Builder regionSpecifier)
|
2.其中dataLocality变量是region的本地化百分比值
float dataLocality =
r.getHDFSBlocksDistribution().getBlockLocalityIndex(serverName.getHostname());
那么下面就看看
getHDFSBlocksDistribution 返回 HDFSBlocksDistribution类型
和
getBlockLocalityIndex
3.HDFSBlocksDistribution是啥
两个成员变量
private Map<String,HostAndWeight> hostAndWeights = null; key:hostname value:HostAndWeight(private String host;private long weight;)//hostname对应的文件大小
private long uniqueBlocksTotalWeight = 0;
getHDFSBlocksDistribution方法
遍历region下面所有StoreFile 得到每个storefile的getHDFSBlockDistribution放到返回的HDFSBlocksDistribution中
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
/**
* This function will return the HDFS blocks distribution based on the data
* captured when HFile is created
* @return The HDFS blocks distribution for the region.
*/
public HDFSBlocksDistribution getHDFSBlocksDistribution() {
HDFSBlocksDistribution hdfsBlocksDistribution =
new HDFSBlocksDistribution();
synchronized ( this .stores) {
//遍历所有StoreFile
for (Store store : this .stores.values()) {
for (StoreFile sf : store.getStorefiles()) {
HDFSBlocksDistribution storeFileBlocksDistribution =
sf.getHDFSBlockDistribution();
hdfsBlocksDistribution.add(storeFileBlocksDistribution);
}
}
}
return hdfsBlocksDistribution;
}
|
4.StoreFile 的getHDFSBlockDistribution方法(HRegionServer–>HRegion-->StoreFile)
1
2
3
4
5
6
7
8
9
|
//StoreFile 的getHDFSBlockDistribution方法
/**
*
* @return the cached value of HDFS blocks distribution. The cached value is
* calculated when store file is opened.
*/
public HDFSBlocksDistribution getHDFSBlockDistribution() {
return this .fileInfo.getHDFSBlockDistribution();
}
|
5.实际使用的是(StoreFileInfo)fileInfo.getHDFSBlockDistribution的信息
1
2
3
4
|
/** @return the HDFS block distribution */
public HDFSBlocksDistribution getHDFSBlockDistribution() {
return this .hdfsBlocksDistribution;
}
|
思考:
fileInfo的hdfsBlocksDistribution在哪里赋值的?
IDE里面在对应变量上面右键选择Find Usages(看哪里被用到了)
Value Read是读取不用看了
看Value Write 在哪里被赋值了,原来是StoreFileInfo的open方法!
1
2
3
4
5
6
|
if ( this .reference != null ) { //reference是split产生的,hfile-link是snapshot产生的
hdfsBlocksDistribution = computeRefFileHDFSBlockDistribution(fs, reference, status);
} else {
//实际数据走这里
hdfsBlocksDistribution = FSUtils.computeHDFSBlocksDistribution(fs, status, 0 , length);
}
|
6.hdfsBlocksDistribution=FSUtils.computeHDFSBlocksDistribution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
/**
* Compute HDFS blocks distribution of a given file, or a portion of the file
* @param fs file system
* @param status file status of the file
* @param start start position of the portion
* @param length length of the portion
* @return The HDFS blocks distribution
*/
static public HDFSBlocksDistribution computeHDFSBlocksDistribution(
final FileSystem fs, FileStatus status, long start, long length)
throws IOException {
HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution();
//fs.getFileBlockLocations是HDFS的api,获取文件对应的block块位置信息
BlockLocation [] blockLocations =
fs.getFileBlockLocations(status, start, length);
//遍历文件下所有block,获取block对应的N个副本位置——hosts数组
for (BlockLocation bl : blockLocations) {
String [] hosts = bl.getHosts();
long len = bl.getLength();
//addHostsAndBlockWeight方法中将副本位置和数据大小放到unique weight和hostAndWeight中
blocksDistribution.addHostsAndBlockWeight(hosts, len);
}
return blocksDistribution;
}
|
7.addHostsAndBlockWeight
上面的blocksDistribution.addHostsAndBlockWeight(hosts, len);具体实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
/**
* add some weight to a list of hosts, update the value of unique block weight
* @param hosts the list of the host
* @param weight the weight
*/
public void addHostsAndBlockWeight(String[] hosts, long weight) {
if (hosts == null || hosts.length == 0 ) {
// erroneous data
return ;
}
addUniqueWeight(weight);
for (String hostname : hosts) {
addHostAndBlockWeight(hostname, weight);
}
}
|
说明
uniqueBlocksTotalWeight是文件的逻辑大小
Map<String,HostAndWeight> hostAndWeights 是每个host上这个region对应的文件逻辑大小
(代码中其实遍历了三副本的hosts,但是因为这个Map的key是host,所以hostAndWeights中每个hostname下面某一文件只会有一份block(HDFS的三副本策略))
8.最后的getBlockLocalityIndex
截止到这里,只看了RS中调用的r.getHDFSBlocksDistribution().getBlockLocalityIndex 前半部分r.getHDFSBlocksDistribution
后面的getBlockLocalityIndex(serverName.getHostname());做了啥计算?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
/**
* return the locality index of a given host
* @param host the host name
* @return the locality index of the given host
*/
public float getBlockLocalityIndex(String host) {
float localityIndex = 0 ;
//获取这个region当前这个Server上面的HostAndWeight
HostAndWeight hostAndWeight = this .hostAndWeights.get(host);
if (hostAndWeight != null && uniqueBlocksTotalWeight != 0 ) {
//用这个region在当前这个机器的block逻辑数据大小/region下文件block的总逻辑大小
localityIndex=( float )hostAndWeight.weight/( float )uniqueBlocksTotalWeight;
}
return localityIndex;
}
|
拓展
HDFS对HBase的影响至关重要,有精力可以多关注HDFS的api和特性
HDFS的BlockLocation都存了啥
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
/**
* Represents the network location of a block, information about the hosts
* that contain block replicas, and other block metadata (E.g. the file
* offset associated with the block, length, whether it is corrupt, etc).
*/
@InterfaceAudience .Public
@InterfaceStability .Stable
public class BlockLocation {
private String[] hosts; // Datanode hostnames
private String[] cachedHosts; // Datanode hostnames with a cached replica
private String[] names; // Datanode IP:xferPort for accessing the block
private String[] topologyPaths; // Full path name in network topology
private long offset; // Offset of the block in the file
private long length;
private boolean corrupt;
|