这几天tune了hbase的几个参数,有些有意思的结果。具体看我下面的邮件吧。
For example, I have total some data and I can tune hbase.hregion.max.filesize to increase/decrease total region number, rite?
I want to know if the region number has performance impact to random read tests. I observed that in my ycsb test, with larger hfile size, I got better tput and smaller latency.
Anybody can give me hints. Thanks.
Tao
![]() |
![]() |


|
显示详细信息
1月18日 (2 天前)
|
Hi Tao,
I think the number of regions won't have much impact to random read throughput and latency. But the number of generations (HFiles) per region will do.
If this is the case, try to run major compaction on the table. This will merge HFile generations so the read throughput and latency will be recovered. You can do this from the hbase shell.
Also, you might want to increase hbase.region.mstore.flush.size to keep the number of HFile generations smaller.
Thanks,
--
Tatsuya Kawano (Mr.)
Tokyo, Japan
I think the number of regions won't have much impact to random read throughput and latency. But the number of generations (HFiles) per region will do.
If this is the case, try to run major compaction on the table. This will merge HFile generations so the read throughput and latency will be recovered. You can do this from the hbase shell.
Also, you might want to increase hbase.region.mstore.flush.size to keep the number of HFile generations smaller.
Thanks,
--
Tatsuya Kawano (Mr.)
Tokyo, Japan
- 显示引用文字 -
![]() |
![]() |
![]() |


|
显示详细信息
1月18日 (2 天前)
|
Thanks for response.
I tuned the values of dfs.block.size and h
base.hregion.max.filesize for my tests (pure read tests) and had below results:
Test dfs.block.size hbase.hregion.max.filesize requests/sec latency
1 32 1024 ~4000 24
2 256 256 ~4500 22
3 1024 1024 ~5000 20
My understanding to the results is that, with less hdfs blocks hfile can speed up the lookup for a random row, avoiding jumping from one block to another (Test 1 vs. Test2); with less but bigger regions performance will also be better? (Test2 vs. Test3).
Sure, I believe number of HFiles per region will have impact, but I truly all
did major compaction using the command line:
major_compact 'mytable'
and checked each region has only one storefile.
Is that correct?
![]() |
![]() |


|
显示详细信息
1月18日 (2 天前)
|
Hi Tao,
Thanks for sharing the test result.
> but I truly
> all did major compaction using the command line:
> major_compact 'mytable'
> and checked each region has only one storefile.
> My understanding to the results is that, with less hdfs blocks hfile can
> speed up the lookup for a random row, avoiding jumping from one block to
> another (Test 1 vs. Test2)
Thanks,
--
Tatsuya Kawano (Mr.)
Tokyo, Japan
- 显示引用文字 -
![]() |
![]() |
![]() |


|
显示详细信息
1月18日 (2 天前)
|
Along with Tatsuya, I thank you for sharing this interesting result.
I too wonder why the bigger block makes a difference -- 25%
improvement is a bunch -- since we set up a socket on each random read
and seek the block (we do not currently reuse connection if correct
block is already in the breach)?
Thanks for trying this experiment.
St.Ack
I too wonder why the bigger block makes a difference -- 25%
improvement is a bunch -- since we set up a socket on each random read
and seek the block (we do not currently reuse connection if correct
block is already in the breach)?
Thanks for trying this experiment.
St.Ack
- 显示引用文字 -