impact of total region numbers?-优快云博客

本文链接：https://blog.youkuaiyun.com/iteye_15479/article/details/82001060

这几天tune了hbase的几个参数，有些有意思的结果。具体看我下面的邮件吧。

For example, I have total some data and I can tune hbase.hregion.max.filesize to increase/decrease total region number, rite?

I want to know if the region number has performance impact to random read tests. I observed that in my ycsb test, with larger hfile size, I got better tput and smaller latency.

Anybody can give me hints. Thanks.

Tao

Tatsuya Kawano

发送至 user

显示详细信息 1月18日 (2 天前)

Hi Tao,

I think the number of regions won't have much impact to random read throughput and latency. But the number of generations (HFiles) per region will do.

If this is the case, try to run major compaction on the table. This will merge HFile generations so the read throughput and latency will be recovered. You can do this from the hbase shell.

Also, you might want to increase hbase.region.mstore.flush.size to keep the number of HFile generations smaller.

Thanks,

--
Tatsuya Kawano (Mr.)
Tokyo, Japan

- 显示引用文字 -

邀请 Tatsuya Kawano 聊天

Tao Xie

发送至 user

显示详细信息 1月18日 (2 天前)

Thanks for response.

I tuned the values of dfs.block.size and h base.hregion.max.filesize for my tests (pure read tests) and had below results:

Test dfs.block.size hbase.hregion.max.filesize requests/sec latency

1 32 1024 ~4000 24

2 256 256 ~4500 22

3 1024 1024 ~5000 20

My understanding to the results is that, with less hdfs blocks hfile can speed up the lookup for a random row, avoiding jumping from one block to another (Test 1 vs. Test2); with less but bigger regions performance will also be better? (Test2 vs. Test3).

Sure, I believe number of HFiles per region will have impact, but I truly all did major compaction using the command line:

major_compact 'mytable'

and checked each region has only one storefile.

Is that correct?

2011/1/18 Tatsuya Kawano <tatsuya6502@gmail.com>

- 显示引用文字 -

Tatsuya Kawano

发送至 user

显示详细信息 1月18日 (2 天前)

Hi Tao,

Thanks for sharing the test result.

> but I truly
> all did major compaction using the command line:
> major_compact 'mytable'
> and checked each region has only one storefile.

Yes, that's what I mean. So that isn't your case.

> My understanding to the results is that, with less hdfs blocks hfile can
> speed up the lookup for a random row, avoiding jumping from one block to
> another (Test 1 vs. Test2)

I can't tell if this is correct just becasuse of my limited knowledge on HDFS. But I think less number of HDFS blocks could make the hard drives to seek the data quicker because HDFS tries to save all bytes in a block in the continuous location of a disk. Less blocks (less fragments) on the hard drives will improve the seek latency especially when multiple threads are trying to access to the same drives.

Thanks,

--
Tatsuya Kawano (Mr.)
Tokyo, Japan

- 显示引用文字 -

邀请 Tatsuya Kawano 聊天

Stack

发送至 user

显示详细信息 1月18日 (2 天前)

Along with Tatsuya, I thank you for sharing this interesting result.

I too wonder why the bigger block makes a difference -- 25%
improvement is a bunch -- since we set up a socket on each random read
and seek the block (we do not currently reuse connection if correct
block is already in the breach)?

Thanks for trying this experiment.
St.Ack

- 显示引用文字 -