hbase compaction

本文详细介绍了HBase中用于选择待压缩文件的压缩算法,并通过三个具体的例子来说明压缩算法的工作原理。文中还讨论了几个重要的配置选项及其对压缩过程的影响。

http://hbase.apache.org/book/regions.arch.html#compaction

 

http://hbase.apache.org/book/

http://hbase.apache.org/book.html

 

摘:

9.7.5.5.1. Compaction File Selection

To understand the core algorithm for StoreFile selection, there is some ASCII-art in the Store source code that will serve as useful reference. It has been copied below:

/* normal skew:
 *
 *         older ----> newer
 *     _
 *    | |   _
 *    | |  | |   _
 *  --|-|- |-|- |-|---_-------_-------  minCompactSize
 *    | |  | |  | |  | |  _  | |
 *    | |  | |  | |  | | | | | |
 *    | |  | |  | |  | | | | | |
 */

Important knobs:

  • hbase.store.compaction.ratio Ratio used in compaction file selection algorithm (default 1.2f).
  • hbase.hstore.compaction.min (.90 hbase.hstore.compactionThreshold) (files) Minimum number of StoreFiles per Store to be selected for a compaction to occur (default 2).
  • hbase.hstore.compaction.max (files) Maximum number of StoreFiles to compact per minor compaction (default 10).
  • hbase.hstore.compaction.min.size (bytes) Any StoreFile smaller than this setting with automatically be a candidate for compaction. Defaults to hbase.hregion.memstore.flush.size (128 mb).
  • hbase.hstore.compaction.max.size (.92) (bytes) Any StoreFile larger than this setting with automatically be excluded from compaction (default Long.MAX_VALUE).

 

The minor compaction StoreFile selection logic is size based, and selects a file for compaction when the file <= sum(smaller_files) *hbase.hstore.compaction.ratio.

9.7.5.5.2. Minor Compaction File Selection - Example #1 (Basic Example)

This example mirrors an example from the unit test TestCompactSelection.

  • hbase.store.compaction.ratio = 1.0f
  • hbase.hstore.compaction.min = 3 (files)
  • hbase.hstore.compaction.max = 5 (files)
  • hbase.hstore.compaction.min.size = 10 (bytes)
  • hbase.hstore.compaction.max.size = 1000 (bytes)

The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.

Why?

  • 100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
  • 50 --> No, because sum(23, 12, 12) * 1.0 = 47.
  • 23 --> Yes, because sum(12, 12) * 1.0 = 24.
  • 12 --> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
  • 12 --> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.

 

9.7.5.5.3. Minor Compaction File Selection - Example #2 (Not Enough Files To Compact)

This example mirrors an example from the unit test TestCompactSelection.

  • hbase.store.compaction.ratio = 1.0f
  • hbase.hstore.compaction.min = 3 (files)
  • hbase.hstore.compaction.max = 5 (files)
  • hbase.hstore.compaction.min.size = 10 (bytes)
  • hbase.hstore.compaction.max.size = 1000 (bytes)

 

The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.

Why?

  • 100 --> No, because sum(25, 12, 12) * 1.0 = 47
  • 25 --> No, because sum(12, 12) * 1.0 = 24
  • 12 --> No. Candidate because sum(12) * 1.0 = 12, there are only 2 files to compact and that is less than the threshold of 3
  • 12 --> No. Candidate because the previous StoreFile was, but there are not enough files to compact

 

9.7.5.5.4. Minor Compaction File Selection - Example #3 (Limiting Files To Compact)

This example mirrors an example from the unit test TestCompactSelection.

  • hbase.store.compaction.ratio = 1.0f
  • hbase.hstore.compaction.min = 3 (files)
  • hbase.hstore.compaction.max = 5 (files)
  • hbase.hstore.compaction.min.size = 10 (bytes)
  • hbase.hstore.compaction.max.size = 1000 (bytes)

The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 7, 6, 5, 4, 3.

Why?

  • 7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21. Also, 7 is less than the min-size
  • 6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15. Also, 6 is less than the min-size.
  • 5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10. Also, 5 is less than the min-size.
  • 4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4 is less than the min-size.
  • 3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is less than the min-size.
  • 2 --> No. Candidate because previous file was selected and 2 is less than the min-size, but the max-number of files to compact has been reached.
  • 1 --> No. Candidate because previous file was selected and 1 is less than the min-size, but max-number of files to compact has been reached.

 

9.7.5.5.5. Impact of Key Configuration Options

hbase.store.compaction.ratio. A large ratio (e.g., 10) will produce a single giant file. Conversely, a value of .25 will produce behavior similar to the BigTable compaction algorithm - resulting in 4 StoreFiles.

hbase.hstore.compaction.min.size. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be adjusted downwards in write-heavy environments where many 1 or 2 mb StoreFiles are being flushed, because every file will be targeted for compaction and the resulting files may still be under the min-size and require further compaction, etc.

<think>好的,我现在需要帮助用户了解HBase集群中Major Compaction的配置与优化。首先,我得先理解Major Compaction的基本概念和它在HBase中的作用。根据用户提供的引用内容,Major Compaction是将一个Region下的所有StoreFile合并成一个文件,同时清理已删除的行和过期版本。这对性能和数据管理很重要,但也会带来较大的资源消耗,所以需要合理配置。 接下来,用户想了解配置和优化的方法。我需要从引用中提取相关的配置参数和优化建议。引用2提到Major Compaction的触发条件,比如用户强制执行、长时间未执行或存在reference文件。引用5建议将hbase.hregion.majorcompaction设为0来禁用自动触发,改为手动执行。这可能是一个优化点,避免高峰期的资源争用。 然后,引用4提到了计划外Major Compaction的原因和避免方法,比如调整throttle参数和max.size参数。需要解释这些参数的作用,比如throttle控制合并的吞吐量,max.size决定参与合并的文件大小上限。调整这些可以限制计划外的合并操作。 引用3提到了compaction相关的配置参数,如hbase.hstore.compaction.ratio和策略选择。可能需要介绍不同的策略,如RatioBasedCompactionPolicy,以及如何调整比率参数来优化合并频率和文件选择。 另外,用户可能需要知道如何手动触发Major Compaction,以及监控和管理的方法。比如使用HBase Shell命令或API手动执行,配合监控工具观察影响。引用5建议在夜间低负载时执行,这也是优化的一部分。 还需要考虑调优的综合策略,比如分层存储、压缩算法、Region大小调整等,这些可能不在提供的引用中,但根据HBase的知识,可以适当补充,但需要确保准确。比如使用Snappy或LZO压缩减少IO,合理设置Region大小避免过多StoreFile。 最后,整理这些点,分步骤回答,确保结构清晰,每个配置参数和优化方法都解释清楚,并引用对应的引用来源。同时生成相关问题,帮助用户进一步学习,如触发条件、手动触发命令、避免计划外合并的方法等。</think>### HBase集群Major Compaction配置与优化 #### 1. **Major Compaction的核心作用** - **数据清理**:合并Region下所有StoreFile为单个文件,删除标记为删除的行和过期版本数据,释放存储空间[^5]。 - **性能影响**:合并过程会占用大量磁盘I/O和网络带宽,可能导致集群短暂性能下降,需合理规划执行时间。 #### 2. **配置参数详解** - **禁用自动触发** 默认Major Compaction周期由`hbase.hregion.majorcompaction`控制(默认值604800000毫秒,约7天)。建议设置为`0`以禁用自动触发,改为手动控制: ```xml <property> <name>hbase.hregion.majorcompaction</name> <value>0</value> </property> ``` - **限制合并文件范围** - `hbase.hstore.compaction.max.size`:仅合并小于该值的文件(默认`Long.MAX_VALUE`),调小可减少大文件参与合并的概率[^4]。 - `hbase.regionserver.thread.compaction.throttle`:控制合并操作的吞吐量阈值(默认2.5GB),调大可降低合并频率[^4]。 - **合并策略调整** 使用`RatioBasedCompactionPolicy`时,调整合并文件选择比例: ```xml <property> <name>hbase.hstore.compaction.ratio</name> <value>1.2</value> </property> ``` 更高比例会选择更多小文件合并[^3]。 #### 3. **优化实践** - **手动触发时机** 通过HBase Shell在低负载时段手动执行: ```bash hbase> major_compact 'table_name' ``` 或使用Admin API编程触发。 - **监控与资源控制** - 监控HBase Master日志及RegionServer资源使用情况。 - 结合HDFS监控工具(如Ganglia)观察磁盘I/O和带宽波动。 - **分层存储策略** 对冷热数据使用不同的压缩算法(如冷数据用ZStandard,热数据用Snappy),降低合并开销[^3]。 #### 4. **问题排查** - **计划外合并的根源** 检查是否因StoreFile中存在`reference`文件(Region分裂残留)触发了强制合并[^2]。 - **合并阻塞处理** 若合并耗时过长,需检查RegionServer日志,排查是否因大文件或网络延迟导致。 ###
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值