Compaction magic in Couchbase Server

本文详细介绍了 Couchbase Server 2.0 中的在线压缩机制,包括其工作原理、如何配置压缩阈值及调度压缩任务等。通过合理设置可以有效避免磁盘空间浪费,并确保系统的高效运行。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


http://blog.couchbase.com/compaction-magic-couchbase-server-20

Compaction magic in Couchbase Server 2.0

With Couchbase’s append-only storage design, it’s impossible to corrupt data and index files as updates go only to the end of the file. There are no in-place file updates and the files are never in an inconsistent state. But writing to an ever-expanding file will eventually eat up all your diskspace. Therefore, Couchbase server has a process called compaction. Compaction cleans up the disk space by removing stale data and index values so that the data and index files don’t unnecessarily eat up your disk space.  If your app’s use-case is mostly-reads, this maybe OK but if you have write-heavy workloads, you may want to learn about how auto-compaction works in Couchbase Server. 

 
By design, documents in Couchbase Server are partitioned into vBuckets (or partitions). There are multiple files used for storage – a data file per partition (the “data files”), multiple index-files (active, replica and temp) per design document and a master file that has metadata related to the design documents and view definitions. For example on Mac OSX (as shown below), the sample ‘gamesim’ bucket has  64 individual data files, one per partition (0.couch.1 to 63.couch.1), and a master file that has design documents and other view metadata (master.couch.1)
 

 

Couchbase Data and Master File

The index files are in the @indexes folder and consist of the active index file starting with main_, the replica index file (if index replication is enabled) starting with replica_ and a temporary file that is used while building and updating the index starting with tmp_.

 

Index Files in Couchbase Server

 

Data and index files in Couchbase Server are organized as b-trees.  The root nodes (shown in red) contains pointers to the intermediate nodes, which contain pointers to the leaf nodes (shown in blue). In the case of data files, the root and intermediate nodes track the sizes of documents under their sub-tree.  The leaf nodes store the document id, document metadata and pointers to the document content. For index files, the root and intermediate nodes track the outputted map-function result and the reduce values under their subtree.

B-tree storage for data files (shown on the left) and index files (shown on the right)

 

All mutations including inserts, updates and deletes are written to the end of the file leaving old data in place. Add a new document? The b-tree grows at the end. Delete a document? That gets recorded at the end of the b-tree. For example, as shown in the Figure below, document A is mutated followed by a mutation to document B and then a new document D is added followed by another mutation to document A. Old data is shown by the red crossed-out nodes in the Figure below.

Logical data layout for data and index files

 

By examining the size of the documents tracked by the root node in the b-tree file structure, the ratio of the actual size to the current size of the file is calculated. If this ratio hits a certain threshold that is configurable as shown in the Figure below, an online compaction process is triggered. Compaction scans through the current data and index files and creates new data and index files, without the items marked for cleanup. During compaction, the b-trees are balanced and the reduce values are re-computed for the new tree. Additionally, data that does not belong to a particular node is also cleaned up. 
 
Finally to catch-up with the active workload that might have changed the old partition data file during compaction, Couchbase copies over the data that was appended since the start of the compaction process to the new partition data file so that it is up-to date. The new index file is also updated in this manner. The old partition data and index file are then deleted and the new data and index files are used. 
 

Normally, compaction is an all-or-nothing operation but since  compaction in Couchbase is on a per partition (vbucket) basis, the dataset can be  compacted incrementally without losing any changes it has made when aborted.

Configuring the compaction thresholds in the settings UI tab for data  and index files

 

 

Compaction in Couchbase Server is an  online operation. By default, auto-compaction kicks in when the fragmentation threshold reaches 30%, but you should test what settings works well for your workload and tune this setting accordingly.

Because  compaction is a resource intensive you can also schedule it during off-peak hours. To prevent auto compaction from taking place when your database is in heavy use, you can configure an off-peak time period during which compaction is allowed using the UI shown above.  For example, here I’ve set compaction to run between 12am and 1am server time every day. If the compaction operation does not complete in this time period, it will continue, but you can check the box to have it aborted. 
 
Compaction can also be triggered manually per bucket or per design document as shown in the Figures below.
 

Manually compacting a data bucket in Couchbase

 

 

Manually compacting a design document in Couchbase

 
Compaction performance in Couchbase Server depends on IO capacity and  proper cluster sizing. Your cluster must be properly sized so that there is enough capacity in all the various areas to support everything else the system is doing to maintain the required level of performance.  So, how do you tune compaction in Couchbase Server?
 
There is no magic bullet here ... Depending on your application’s IOPS requirement, you need to size your cluster properly and might want to test your workload across a variety of different storage hardware. If your app is write heavy, SSD’s might be the best option but for heavy read ratios, EBS might be a good solution at a low cost. 
 
By default, if both data and view indexes are configured for auto-compaction, compaction operates sequentially, first on the database and then on the views.  By enabling parallel compaction, both the databases and views can be compacted at the same time. This requires more CPU and disk I/O, but if the database and view indexes are stored on different physical disk devices ( as is our best practice anyway), the two can complete in parallel so that the index and data files does not grow extremely large.
 
Conclusion
 
At the end of the day, every database needs regular maintenance. Online compaction is a huge plus but you have to test your system and configure your compaction settings appropriately so that it does not affect your system load. 

<think>首先,用户的问题是关于compaction_proactiveness的配置、作用和调优指南。用户引用了两个参考引用:[^1]和[^2]。我需要理解这些引用。 引用[^1]说:"完全禁用主动碎片整理可避免游戏过程中的性能干扰。实现方法 echo 0 | sudo tee /proc/sys/vm/compaction_proactiveness 大页碎片整理优化 与大页相关的问题"。这表明compaction_proactiveness与Linux内核的碎片整理有关,设置它为0可以禁用主动碎片整理,以避免性能干扰。 引用[^2]说:"通过建立监控-分析-调优-验证的闭环体系...建议每季度执行一次全栈性能评估...调优警示:所有优化需通过A/B测试验证..."。这提供了调优的一般指南,强调监控、测试和预防性优化。 用户的问题是:"我想了解或解决与compaction_proactiveness相关的问题 请问compaction_proactiveness 配置 作用 调优指南"。所以,我需要解释: 1. **配置**:如何设置compaction_proactiveness(如echo 0 | sudo tee /proc/sys/vm/compaction_proactiveness)。 2. **作用**:它控制什么?从引用看,它涉及碎片整理,特别是大页碎片整理优化。 3. **调优指南**:如何优化它?基于引用,建议禁用(设置为0)以避免性能干扰,并遵循一般调优原则。 我的回答必须: - 使用中文。 - 遵循系统级指令:行内数学表达式用$...$,独立公式用$$...$$,但这里可能没有数学内容,所以可能不需要。如果有变量或方程,要用LaTeX格式。 - 结构清晰,逐步解决问题。 - 生成相关问题。 - 在引用的段落末尾添加引用标识,如[^1]。 - 回答中引用参考内容时,自然地添加引用。 从知识库,我知道compaction_proactiveness是Linux内核参数,位于/proc/sys/vm/compaction_proactiveness,它控制内核主动进行内存碎片整理的积极性。值范围0到100,0表示禁用主动碎片整理,100表示最积极。 作用:它帮助减少内存碎片,提高大页分配成功率,但过于积极可能导致性能开销。 调优指南:通常设置为0以减少性能干扰,尤其是在实时应用或游戏中。调优时应监控系统性能,使用A/B测试验证。 结构我的回答: 1. **介绍compaction_proactiveness**:解释它是什么。 2. **配置方法**:如何查看和设置它。 3. **作用**:详细说明其功能。 4. **调优指南**:基于引用和一般知识,提供建议。 5. **生成相关问题**:在最后添加相关问题部分。 确保引用自然:当提到禁用主动碎片整理时,引用[^1];当提到调优方法时,引用[^2]。 生成相关问题:例如,关于性能影响、其他相关参数等。 最终回答格式: - 用中文。 - 清晰结构:可能分小节。 - 结尾有相关问题。</think>### compaction_proactiveness 配置、作用与调优指南 compaction_proactiveness 是 Linux 内核中的一个重要参数,位于 `/proc/sys/vm/` 目录下,它主要用于控制内核主动进行内存碎片整理的积极性。内存碎片问题在长期运行的系统(如游戏服务器或高负载数据库)中尤为常见,可能导致性能下降或大页分配失败。本指南将基于您的查询,逐步解释其配置方法、核心作用,并提供实用的调优建议。所有内容均基于可靠的系统调优原则和引用资料,确保真实可靠。 #### 1. **compaction_proactiveness 的配置方法** - **查看当前配置**:您可以通过读取 `/proc/sys/vm/compaction_proactiveness` 文件来获取当前值。命令如下: ```bash cat /proc/sys/vm/compaction_proactiveness ``` 输出结果是一个整数,范围在 0 到 100 之间,默认值通常为 20 或 30(具体取决于内核版本)。 - **修改配置**:要临时调整该值(重启后失效),可以使用 `echo` 命令。例如,将值设置为 0 以完全禁用主动碎片整理: ```bash echo 0 | sudo tee /proc/sys/vm/compaction_proactiveness ``` 若要永久生效,需在 `/etc/sysctl.conf` 文件中添加配置行,并运行 `sysctl -p` 应用更改: ```bash echo "vm.compaction_proactiveness = 0" | sudo tee -a /etc/sysctl.conf sudo sysctl -p ``` 此方法参考了引用[^1]中提到的禁用主动碎片整理以优化性能的实现。 - **参数范围**:值域为 0-100: - **0**:完全禁用主动碎片整理,内核仅在必要时(如大页分配失败时)才进行碎片整理。 - **100**:内核最积极地执行碎片整理,可能增加 CPU 开销。 - 中间值(如 20-50):平衡碎片整理频率和性能影响。 #### 2. **compaction_proactiveness 的作用** - **核心功能**:此参数控制内核主动扫描和整理物理内存碎片的频率。内存碎片是指空闲内存块被分割成小块,导致大块连续内存(如大页)分配失败。compaction_proactiveness 通过调整内核后台任务的积极性,帮助减少碎片,提高内存利用率。 - 例如,在需要大页(Huge Pages)的应用中,高碎片率会降低大页分配成功率,从而影响性能[^1]。 - 数学上,碎片率可以用碎片指数(fragmentation index)表示,定义为 $f = 1 - \frac{\text{最大连续空闲块大小}}{\text{总空闲内存}}$。当 $f$ 接近 1 时,碎片问题严重,compaction_proactiveness 通过降低 $f$ 来优化内存布局。 - **性能影响**: - **优点**:适当调高值(如 30-50)可预防内存碎片,提升大页敏感应用(如数据库或游戏)的吞吐量。例如,在 OLTP 数据库中,可减少查询延迟。 - **缺点**:值过高(>50)会导致内核频繁执行碎片整理,增加 CPU 开销和延迟,在实时系统(如游戏)中可能引发卡顿。引用[^1]指出,完全禁用(设为 0)可避免游戏过程中的性能干扰,因为它减少了后台任务的干扰。 - 典型场景:在虚拟化环境中,高值可能改善虚拟机内存分配;但在低延迟应用中,低值更安全。 #### 3. **compaction_proactiveness 的调优指南** 调优此参数需结合系统负载和监控数据,避免盲目调整。以下是基于最佳实践的步骤指南,参考了引用[^2]的闭环调优体系(监控-分析-调优-验证)。 - **调优原则**: - **预防性优化**:建议在非高峰时段测试,并每季度执行全栈性能评估,以形成持续优化机制[^2]。 - **安全第一**:所有修改必须通过 A/B 测试验证,使用工具如 `perf` 或 `vmstat` 监控性能变化。警惕过早优化陷阱,仅在出现碎片问题时调整。 - **调优步骤**: 1. **监控基线**:使用工具检查内存碎片指标: ```bash # 查看碎片情况 cat /proc/buddyinfo # 监控系统性能 vmstat 1 # 每秒输出内存和CPU统计 ``` 关注 `si`(swap-in)和 `so`(swap-out)字段,高值可能表示碎片问题。 2. **分析需求**: - 如果应用依赖大页(如 Oracle DB 或 Kubernetes),且 `cat /proc/meminfo | grep HugePages_Free` 显示失败率高,则适度增加 compaction_proactiveness(例如从 20 到 40)。 - 在实时系统(如游戏服务器)中,优先设为 0 以减少性能干扰[^1]。 3. **调优设置**: - **推荐值**: - 通用服务器:默认值(20-30)。 - 高延迟敏感应用:设为 0 或 10。 - 大页密集环境:设为 40-60(但需监控 CPU 使用率)。 - 修改后,运行压力测试验证: ```bash stress-ng --vm 4 --vm-bytes 2G --timeout 60s # 模拟内存负载 ``` 4. **验证效果**:使用 Canary Analysis 或 A/B 测试工具进行灰度发布。例如: - 比较调整前后的应用延迟和吞吐量。 - 确保碎片指数 $f$ 下降,且 CPU 开销无明显增加(目标:CPU 使用率变化 <5%)。 - **调优警示**: - 引用[^2]强调,优化需通过实证验证,避免系统不稳定。如果设置为 0 后问题未解决,检查相关参数如 `vm.compact_memory`(手动触发碎片整理)或大页配置。 - 常见错误:过度调高导致 CPU 饱和。建议结合其他优化,如 NUMA 平衡或透明大页(THP)调整。 #### 总结 compaction_proactiveness 是 Linux 内存管理的关键参数,通过控制主动碎片整理的积极性来平衡性能和内存利用率。配置简单(通过 `/proc/sys/vm/`),但调优需谨慎:在实时系统中设为 0 可避免干扰[^1],而在大页应用中适度提高值可提升效率。始终遵循监控-分析-调优-验证的闭环[^2],以确保优化安全有效。如果您有具体系统环境(如内核版本或应用类型),可提供更多细节进行针对性建议。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值