hbase-tunning-hotspot_hbase hotspot-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_34969081/article/details/79111945

1. rowkey 和hotspot关系。

1.1 region

Regions are the basic element of availability and distribution for tables, and are comprised of a Store per Column Family.

region 是一个表的基本元素，具有可用性和分布性，是由每个列簇存储组成。

1.2 rowkey

Row keys are uninterpreted bytes. Rows are lexicographically sorted with the lowest order appearing first in a table

row 是一个未解释的字节，row按照字典进行存储，最低的出现在表头。

1.3 region和rowkey的关系

可以理解row 是column family 的索引，通过locate_region ‘[table_name]’,’[rowkey_name]’

HBase also attempts to store rows near each other in the same region, on the same region server.
hbase 的规则存储相近row在相同的region 和region server

1.4 hotspot产生的原因

如果按照字典设计row
- 优点可读性强，start key和stop key高效地读取数据. 数量大会自动划分region在不同的region server
- 缺点在写的时候，产生hotspot, 就会按照字典进行写，rowkey 就可能在同一个region上面，就会导致只有一个region在运行。这是不被接受。

1.5 解决hotspot的方法

解决hotspot的方式有两种
一种是salt和hash

salt 就是增加一个前缀，它rowkey分布到不同的region上。

a-foo0003
b-foo0001
c-foo0004
d-foo0002

使用hash值， one-way hash 算法。缺点就是可读性差。

最好的是将两者结合。 hash+后缀，eg hashcode+timestamp

tsdb 就满足两者结合，避免hotspot

 salt<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]

With salting enabled (as of OpenTSDB 2.2) the first byte (or bytes) are a hashed salt ID to better distribute data across multiple regions and/or region servers

00000150E22700000001000001000002000004
'----''------''----''----''----''----'
metric  time   tagk  tagv  tagk  tagv

第二filed是time不是timestamp，表示的是时间而不是时间戳，eg. 2018/1/12 04:00:00,而后面的分秒则是写在colum quarfities。
这样做的原因是优化索引.