HBase Compression

本文介绍了HBase的压缩算法,包括GZIP, LZO和Snappy,并推荐使用Snappy,因其具备良好的编码/解码速度和压缩率。文章强调了在非压缩内容如JPEG图片之外的场景中启用压缩通常能提升整体性能,因为CPU的压缩和解压开销小于从磁盘读取更多数据的代价。同时,讲述了如何启用压缩以及可用的压缩算法特点。" 134620021,8658200,OpenGL在Qt中实现图元颜色设置,"['OpenGL', 'Qt', 'C++', '图形编程']

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Compression就是在用CPU换IO吞吐量/磁盘空间,如果没有什么特殊原因推荐针对Column Family设置compression,下面主要有三种算法: GZIP, LZO, Snappy,作者推荐使用Snappy,因为它有较好的Encoding/Decoding速度和可以接受的压缩率。

HBase comes with support for a number of compression algorithims that can be enabled at the column family level. Enabling compression is recommended unless you have a reason not to do so, for example, when using already compressed content, such as JPEG images. For every other use-case compression usually will yield an overall better performance, because the overhead of the CPU performing the compression and decompression is less than what is required to read more data from disk.

Available Codecs

You can choose from a fixed list of supported compression algorithms. They have different qualities when it comes to compression ratio, as well as CPU and installation requirements.

Table 11.1. Comparison between compression algorithms

Algorithm% remainingEncodingDecoding
GZIP13.4%21 MB/s118 MB/s
LZO20.5%135 MB/s410 MB/s
Zippy/Snappy22.2%172 MB/s409 MB/s

Note that some of the algorithms have a better compression ration while others are faster for the encoding, and a lot faster during decoding. Depending on your use-case you can choose one that suits you best.

Enabling Compression

Enabling compression requires the installation of the JNI and native compression libraries (unless you only want to use the Java code based GZIP compression), as described above, and specifying the chosen algorithm in the column family schema.

One way to accomplish this is during the creation of the table. The possible values are listed in the section called “Column Families”:

hbase(main):001:0> create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }  
0 row(s) in 1.1920 seconds

hbase(main):012:0> describe 'testtable'                                            
DESCRIPTION                                                 ENABLED
{NAME => 'testtable', FAMILIES => [{NAME => 'colfam1',      true 
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS 
=> '3', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0400 seconds

The describe shell command is used to read back the schema of the newly created table. You can see the compression is set to GZIP (using the shorter "GZ" value as required). Another option to enable - or change, or disable - the compression algorithm is using the alter command for existing tables:

hbase(main):013:0> create 'testtable2', 'colfam1'
0 row(s) in 1.1920 seconds

hbase(main):014:0> disable 'testtable2'
0 row(s) in 2.0650 seconds

hbase(main):016:0> alter 'testtable2', { NAME => 'colfam1', COMPRESSION => 'GZ' }
0 row(s) in 0.2190 seconds

hbase(main):017:0> enable 'testtable2'
0 row(s) in 2.0410 seconds

Note how the table was first disabled. This is necessary to perform the alteration of the column family definition. The final enable command brings the table back online.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值