文件系统缓存dirty_ratio与dirty_background_ratio两个参数区别

本文深入探讨了Linux系统中vm.dirty_background_ratio和vm.dirty_ratio参数的作用,解释了它们如何控制文件系统缓存的使用和数据持久化策略,提供了调整这些参数以优化数据库性能的指导。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文转自  http://blog.sina.com.cn/s/blog_448574810101k1va.html

请尊重原作者知识产权!



这两天在调优数据库性能的过程中需要降低操作系统文件Cache对数据库性能的影响,故调研了一些降低文件系统缓存大小的方法,其中一种是通过修改/proc/sys/vm/dirty_background_ration以及/proc/sys/vm/dirty_ratio两个参数的大小来实现。看了不少相关博文的介绍,不过一直弄不清楚这两个参数的区别在哪里,后来看了下面的一篇英文博客才大致了解了它们的不同。

vm.dirty_background_ratio:这个参数指定了当文件系统缓存脏页数量达到系统内存百分之多少时(如5%)就会触发pdflush/flush/kdmflush等后台回写进程运行,将一定缓存的脏页异步地刷入外存;

vm.dirty_ratio:而这个参数则指定了当文件系统缓存脏页数量达到系统内存百分之多少时(如10%),系统不得不开始处理缓存脏页(因为此时脏页数量已经比较多,为了避免数据丢失需要将一定脏页刷入外存);在此过程中很多应用进程可能会因为系统转而处理文件IO而阻塞。

之前一直错误的一位dirty_ratio的触发条件不可能达到,因为每次肯定会先达到vm.dirty_background_ratio的条件,后来才知道自己理解错了。确实是先达到vm.dirty_background_ratio的条件然后触发flush进程进行异步的回写操作,但是这一过程中应用进程仍然可以进行写操作,如果多个应用进程写入的量大于flush进程刷出的量那自然会达到vm.dirty_ratio这个参数所设定的坎,此时操作系统会转入同步地处理脏页的过程,阻塞应用进程。


附上原文:

Better Linux Disk Caching & Performance with vm.dirty_ratio & vm.dirty_background_ratio

by BOB PLANKERS on DECEMBER 22, 2013

in BEST PRACTICES,CLOUD,SYSTEM ADMINISTRATION,VIRTUALIZATION

This is post #16 in my December 2013 series about Linux Virtual Machine Performance Tuning. For more, please see the tag “Linux VM Performance Tuning.”

In previous posts on vm.swappiness and using RAM disks we talked about how the memory on a Linux guest is used for the OS itself (the kernel, buffers, etc.), applications, and also for file cache. File caching is an important performance improvement, and read caching is a clear win in most cases, balanced against applications using the RAM directly. Write caching is trickier. The Linux kernel stages disk writes into cache, and over time asynchronously flushes them to disk. This has a nice effect of speeding disk I/O but it is risky. When data isn’t written to disk there is an increased chance of losing it.

There is also the chance that a lot of I/O will overwhelm the cache, too. Ever written a lot of data to disk all at once, and seen large pauses on the system while it tries to deal with all that data? Those pauses are a result of the cache deciding that there’s too much data to be written asynchronously (as a non-blocking background operation, letting the application process continue), and switches to writing synchronously (blocking and making the process wait until the I/O is committed to disk). Of course, a filesystem also has to preserve write order, so when it starts writing synchronously it first has to destage the cache. Hence the long pause.

The nice thing is that these are controllable options, and based on your workloads & data you can decide how you want to set them up. Let’s take a look:

$ sysctl -a | grep dirty vm.dirty_background_ratio = 10 vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000

vm.dirty_background_ratio is the percentage of system memory that can be filled with “dirty” pages — memory pages that still need to be written to disk — before the pdflush/flush/kdmflush background processes kick in to write it to disk. My example is 10%, so if my virtual server has 32 GB of memory that’s 3.2 GB of data that can be sitting in RAM before something is done.

vm.dirty_ratio is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point all new I/O blocks until dirty pages have been written to disk. This is often the source of long I/O pauses, but is a safeguard against too much data being cached unsafely in memory.

vm.dirty_background_bytes and vm.dirty_bytes are another way to specify these parameters. If you set the _bytes version the _ratio version will become 0, and vice-versa.

vm.dirty_expire_centisecs is how long something can be in cache before it needs to be written. In this case it’s 30 seconds. When the pdflush/flush/kdmflush processes kick in they will check to see how old a dirty page is, and if it’s older than this value it’ll be written asynchronously to disk. Since holding a dirty page in memory is unsafe this is also a safeguard against data loss.

vm.dirty_writeback_centisecs is how often the pdflush/flush/kdmflush processes wake up and check to see if work needs to be done.

You can also see statistics on the page cache in /proc/vmstat:

$ cat /proc/vmstat | egrep "dirty|writeback" nr_dirty 878 nr_writeback 0 nr_writeback_temp 0

In my case I have 878 dirty pages waiting to be written to disk.

Approach 1: Decreasing the Cache

As with most things in the computer world, how you adjust these depends on what you’re trying to do. In many cases we have fast disk subsystems with their own big, battery-backed NVRAM caches, so keeping things in the OS page cache is risky. Let’s try to send I/O to the array in a more timely fashion and reduce the chance our local OS will, to borrow a phrase from the service industry, be “in the weeds.” To do this we lower vm.dirty_background_ratio and vm.dirty_ratio by adding new numbers to /etc/sysctl.conf and reloading with “sysctl –p”:

vm.dirty_background_ratio = 5 vm.dirty_ratio = 10

This is a typical approach on virtual machines, as well as Linux-based hypervisors. I wouldn’t suggest setting these parameters to zero, as some background I/O is nice to decouple application performance from short periods of higher latency on your disk array & SAN (“spikes”).

Approach 2: Increasing the Cache

There are scenarios where raising the cache dramatically has positive effects on performance. These situations are where the data contained on a Linux guest isn’t critical and can be lost, and usually where an application is writing to the same files repeatedly or in repeatable bursts. In theory, by allowing more dirty pages to exist in memory you’ll rewrite the same blocks over and over in cache, and just need to do one write every so often to the actual disk. To do this we raise the parameters:

vm.dirty_background_ratio = 50 vm.dirty_ratio = 80

Sometimes folks also increase the vm.dirty_expire_centisecs parameter to allow more time in cache. Beyond the increased risk of data loss, you also run the risk of long I/O pauses if that cache gets full and needs to destage, because on large VMs there will be a lot of data in cache.

Approach 3: Both Ways

There are also scenarios where a system has to deal with infrequent, bursty traffic to slow disk (batch jobs at the top of the hour, midnight, writing to an SD card on a Raspberry Pi, etc.). In that case an approach might be to allow all that write I/O to be deposited in the cache so that the background flush operations can deal with it asynchronously over time:

vm.dirty_background_ratio = 5 vm.dirty_ratio = 80

Here the background processes will start writing right away when it hits that 5% ceiling but the system won’t force synchronous I/O until it gets to 80% full. From there you just size your system RAM and vm.dirty_ratio to be able to consume all the written data. Again, there are tradeoffs with data consistency on disk, which translates into risk to data. Buy a UPS and make sure you can destage cache before the UPS runs out of power. :)

No matter the route you choose you should always be gathering hard data to support your changes and help you determine if you are improving things or making them worse. In this case you can get data from many different places, including the application itself, /proc/vmstat, /proc/meminfo, iostat, vmstat, and many of the things in /proc/sys/vm. Good luck!

<think>嗯,用户让我解释`/proc/vmstat`中的各个参数,这些参数看起来都是关于内存管理的统计信息。首先,我需要确定用户的需求是什么。他们可能在进行系统性能调优,或者遇到了内存相关的问题,需要了解这些指标的具体含义。 用户之前问过UDP内存在内核还是用户态的问题,说明他们对系统内存管理有一定的兴趣,现在进一步深入到了`vmstat`的参数。可能需要这些信息来诊断内存泄漏、页面交换问题或者系统负载情况。 接下来,我需要回忆每个参数的作用。比如,`nr_free_pages`是系统当前的空闲页数,而`nr_slab_reclaimable`和`nr_slab_unreclaimable`涉及Slab分配器的内存。然后,考虑如何分类这些参数,可能按内存类型(如匿名页、文件页)、状态(活跃、非活跃)、或特定功能(脏页、回写)来分组。 用户可能还需要知道这些参数如何影响系统性能。例如,`nr_dirty`过高可能意味着需要更多的回写操作,影响I/O性能。或者`nr_slab_unreclaimable`较多时,可能表明内核对象占用内存无法释放,导致内存压力。 同时,要确保解释清晰,结构分明。可能需要将参数分成几个主要类别,比如基础内存统计、Slab相关、页面回收扫描、脏页回写、透明大页等。每个类别下列举相关参数并简要说明其作用和意义。 还要注意用户可能的后续问题,比如如何监控或调整这些参数,所以在解释时可能需要提及相关的工具或命令,比如使用`slabtop`查看Slab使用情况,或者通过`sysctl`调整脏页阈值。 最后,检查是否有遗漏的参数,确保每个条目都被覆盖。例如,`nr_mlock`表示被mlock锁定的内存页数,而`nr_shmem`是共享内存的使用情况,这些都是重要的点,需要明确解释。</think>--- ### **`/proc/vmstat` 参数详解** --- #### **一、基础内存统计** | 参数 | 解释 | 单位 | 示例值 | 典型场景关联 | |-------------------------------|--------------------------------------------------------------------|-------|--------|------------------------------| | `nr_free_pages` | 系统当前完全空闲的物理页数量 | 页数 | 30355 | 直接反映系统剩余可用内存 | | `nr_alloc_batch` | 伙伴系统(Buddy System)批量分配的预取页数(优化连续分配性能) | 页数 | 22 | 高频内存分配时此值可能较高 | --- #### **二、匿名页(Anonymous Pages)** | 参数 | 解释 | 分类说明 | |-------------------------------|--------------------------------------------------------------------|--------------------------| | `nr_inactive_anon` | 非活跃匿名页(长时间未访问的堆/栈内存) | 可能被交换到Swap | | `nr_active_anon` | 活跃匿名页(近期被访问的进程私有内存) | 优先保留在物理内存 | | `nr_anon_pages` | 所有匿名页总数(`= nr_inactive_anon + nr_active_anon`) | 包含未映射到文件的进程内存 | --- #### **三、文件页(File-backed Pages)** | 参数 | 解释 | 关键行为 | |-------------------------------|--------------------------------------------------------------------|------------------------| | `nr_inactive_file` | 非活跃文件页(缓存文件数据,但近期未访问) | 优先被回收 | | `nr_active_file` | 活跃文件页(频繁访问的文件缓存,如常用程序二进制文件) | 内核尽量保留在内存 | | `nr_file_pages` | 所有文件页总数(`= nr_inactive_file + nr_active_file`) | 包含Page Cache | --- #### **四、Slab分配器相关** | 参数 | 解释 | 调优关联 | |-------------------------------|--------------------------------------------------------------------|----------------------| | `nr_slab_reclaimable` | 可回收Slab内存(如文件系统缓存、UDP缓冲区) | 内存不足时自动释放 | | `nr_slab_unreclaimable` | 不可回收Slab内存(如网络协议栈元数据、内核数据结构) | 需重启服务才能释放 | --- #### **五、页面回收扫描** | 参数 | 解释 | 监控意义 | |-------------------------------|--------------------------------------------------------------------|----------------------| | `nr_pages_scanned` | 最近一次内存回收扫描的页数 | 值高表示内存压力大 | | `workingset_refault` | 近期从交换缓存重新加载到内存的页数 | 反映Swap使用频率 | | `workingset_activate` | 被重新激活的非活跃页数量 | 内存紧张时此值会升高 | --- #### **六、脏页回写(Dirty Pages)** | 参数 | 解释 | 性能影响 | |-------------------------------|--------------------------------------------------------------------|---------------------| | `nr_dirty` | 等待写回磁盘的脏页数量 | 过高可能导致I/O延迟 | | `nr_writeback` | 正在回写磁盘的页数 | 反映当前I/O负载 | | `nr_dirty_threshold` | 系统允许的最大脏页阈值(超过会触发强制回写) | 由`vm.dirty_ratio`控制 | | `nr_dirty_background_threshold` | 后台回写触发的脏页阈值 | 由`vm.dirty_background_ratio`控制 | --- #### **七、其他关键参数** | 参数 | 解释 | 特殊场景 | |-------------------------------|--------------------------------------------------------------------|----------------------| | `nr_mapped` | 用户进程映射的页数(包括文件映射和共享内存) | 反映进程内存占用 | | `nr_page_table_pages` | 页表占用的内存页数 | 大内存应用可能增加此值 | | `nr_kernel_stack` | 内核线程栈占用的页数(每个线程约占用2页) | 线程数暴涨时需关注 | | `nr_shmem` | 共享内存(tmpfs、共享匿名映射)使用的页数 | Docker容器中常见较高 | | `nr_mlock` | 被`mlock()`锁定的不可交换内存页数 | 关键服务防止OOM时使用 | --- #### **八、透明大页(THP)相关** | 参数 | 解释 | 优化方向 | |---------------------------------------|--------------------------------------------------------------------|----------------------| | `nr_anon_transparent_hugepages` | 匿名透明大页数量(2MB/1GB大页) | 减少TLB Miss | --- ### **示例诊断分析** #### **场景:内存泄漏嫌疑** 1. **观察点**: ```bash watch -n 1 "grep -E 'nr_free_pages|nr_anon_pages|nr_slab_unreclaimable' /proc/vmstat" ``` 2. **异常特征**: - `nr_free_pages`持续下降 - `nr_anon_pages`或`nr_slab_unreclaimable`同步增长 3. **工具配合**: ```bash slabtop -o # 观察Slab增长趋势 ps aux --sort=-%mem # 定位占用内存的进程 ``` --- ### **参数调整参考** #### 调整脏页回写阈值(需root权限) ```bash # 降低脏页堆积风险 sysctl -w vm.dirty_ratio=10 # 最大脏页占比降至10% sysctl -w vm.dirty_background_ratio=5 # 后台回写阈值降至5% ``` --- ### **总结** | **参数类型** | **核心参数** | **运维关注重点** | |----------------------|-------------------------------|---------------------------------| | 内存压力指标 | `nr_free_pages`, `nr_pages_scanned` | 值持续降低需警惕内存不足 | | Slab内存问题 | `nr_slab_unreclaimable` | 异常增长可能暗示内核内存泄漏 | | 交换效率 | `workingset_refault` | 频繁Swap in/out需优化内存或扩容 | | I/O性能瓶颈 | `nr_dirty`, `nr_writeback` | 结合`iostat`分析磁盘负载 |
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值