hugepages

Performance tuning: HugePages in Linux

Nov 10, 2008 / By Riyaj Shamsudeen

Tags: DBA Lounge, Oracle

Recently we quickly and efficiently resolved a major performance issue with one of our New York clients. In this blog, I will discuss about this performance issue and its solution.

Problem statement

The client’s central database was intermittently freezing because of high CPU usage, and their business severely affected. They had already worked with vendor support and the problem was still unresolved.

Symptoms

Intermittent High Kernel mode CPU usage was the symptom. The server hardware was 4 dual-core CPUs, hyperthreading enabled, with 20GB of RAM, running a Red Hat Linux OS with a 2.6 kernel.

During this database freeze, all CPUs were using kernel mode and the database was almost unusable. Even log-ins and simple SQL such as SELECT * from DUAL; took a few seconds to complete. A review of the AWR report did not help much, as expected, since the problem was outside the database.

Analyzing the situation, collecting system activity reporter (sar) data, we could see that at 08:32 and then at 8:40, CPU usage in kernel mode was almost at 70%. It is also interesting to note that, SADC (sar data collection) also suffered from this CPU spike, since SAR collection at 8:30 completed two minutes later at 8:32, as shown below.

A similar issue repeated at 10:50AM:

07:20:01 AM CPU   %user     %nice   %system   %iowait     %idle
07:30:01 AM all    4.85      0.00     77.40      4.18     13.58
07:40:01 AM all   16.44      0.00      2.11     22.21     59.24
07:50:01 AM all   23.15      0.00      2.00     21.53     53.32
08:00:01 AM all   30.16      0.00      2.55     15.87     51.41
08:10:01 AM all   32.86      0.00      3.08     13.77     50.29
08:20:01 AM all   27.94      0.00      2.07     12.00     58.00
08:32:50 AM all   25.97      0.00     25.42     10.73     37.88 <--
08:40:02 AM all   16.40      0.00     69.21      4.11     10.29 <--
08:50:01 AM all   35.82      0.00      2.10     12.76     49.32
09:00:01 AM all   35.46      0.00      1.86      9.46     53.22
09:10:01 AM all   31.86      0.00      2.71     14.12     51.31
09:20:01 AM all   26.97      0.00      2.19      8.14     62.70
09:30:02 AM all   29.56      0.00      3.02     16.00     51.41
09:40:01 AM all   29.32      0.00      2.62     13.43     54.62
09:50:01 AM all   21.57      0.00      2.23     10.32     65.88
10:00:01 AM all   16.93      0.00      3.59     14.55     64.92
10:10:01 AM all   11.07      0.00     71.88      8.21      8.84
10:30:01 AM all   43.66      0.00      3.34     13.80     39.20
10:41:54 AM all   38.15      0.00     17.54     11.68     32.63 <--
10:50:01 AM all   16.05      0.00     66.59      5.38     11.98 <--
11:00:01 AM all   39.81      0.00      2.99     12.36     44.85

Performance forensic analysis

The client had access to a few tools, none of which were very effective. We knew that there is excessive kernel mode CPU usage. To understand why, we need to look at various metrics at 8:40 and 10:10.

Fortunately, sar data was handy. Looking at free memory, we saw something odd. At 8:32, free memory was 86MB; at 8:40 free memory climbed up to 1.1GB. At 10:50 AM free memory went from 78MB to 4.7GB. So, within a range of ten minutes, free memory climbed up to 4.7GB.

07:40:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached
07:50:01 AM    225968  20323044     98.90    173900   7151144
08:00:01 AM    206688  20342324     98.99    127600   7084496
08:10:01 AM    214152  20334860     98.96    109728   7055032
08:20:01 AM    209920  20339092     98.98     21268   7056184
08:32:50 AM     86176  20462836     99.58      8240   7040608
08:40:02 AM   1157520  19391492     94.37     79096   7012752
08:50:01 AM   1523808  19025204     92.58    158044   7095076
09:00:01 AM    775916  19773096     96.22    187108   7116308
09:10:01 AM    430100  20118912     97.91    218716   7129248
09:20:01 AM    159700  20389312     99.22    239460   7124080
09:30:02 AM    265184  20283828     98.71    126508   7090432
10:41:54 AM     78588  20470424     99.62      4092   6962732  <--
10:50:01 AM   4787684  15761328     76.70     77400   6878012  <--
11:00:01 AM   2636892  17912120     87.17    143780   6990176
11:10:01 AM   1471236  19077776     92.84    186540   7041712

This tells us that there is a correlation between this CPU usage and the increase in free memory. If free memory goes from 78MB to 4.7GB, then the paging and swapping daemons must be working very hard. Of course, releasing 4.7GB of memory to the free pool will sharply increase paging/swapping activity, leading to massive increase in kernel
mode CPU usage. This can lead to massive kernel mode CPU usage.

Most likely, much of SGA pages also can be paged out, since SGA is not locked in memory.

Memory breakdown

The client’s question was, if paging/swapping is indeed the issue, then what is using all my memory? It’s a 20GB server, SGA size is 10GB and no other application is running. It gets a few hundred connections at a time, and PGA_aggregated_target is set to 2GB. So why would it be suffering from memory starvation? If memory is the issue, how can there be 4.7GB of free memory at 10:50AM?

Recent OS architectures are designed to use all available memory. Therefore, paging daemons doesn’t wake up until free memory falls below a certain threshold. It’s possible for the free memory to drop near zero and then climb up quickly as the paging/swapping daemon starts to work harder and harder. This explains why free memory went down to 78MB and rose to 4.7GB 10 minutes later.

What is using my memory though? /proc/meminfo is useful in understanding that, and it shows that the pagetable size is 5GB. How interesting!

Essentially, pagetable is a mapping mechanism between virtual and physical address. For a default OS Page size of 4KB and a SGA size of 10GB, there will be 2.6 Million OS pages just for SGA alone. (Read wikipedia’s entry on page table for more information about page tables.) On this server, there will be 5 million OS pages for 20GB total memory. It will be an enormous workload for the paging/swapping daemon to manage all these pages.

cat /proc/meminfo

MemTotal:     20549012 kB
MemFree:        236668 kB
Buffers:         77800 kB
Cached:        7189572 kB
...
PageTables:    5007924 kB  <--- 5GB!
...
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

HugePages

Fortunately, we can use HugePages in this version of Linux. There are couple of important benefits of HugePages:

  1. Page size is set 2MB instead of 4KB
  2. Memory used by HugePages is locked and cannot be paged out.

With a pagesize of 2MB, 10GB SGA will have only 5000 pages compared to 2.6 million pages without HugePages. This will drastically reduce the page table size. Also, HugeTable memory is locked and so SGA can’t be swapped out. The working set of buffers for the paging/swapping daemon will be smaller.

To setup HugePages, the following changes must be completed:

  1. Set the vm.nr_hugepages kernel parameter to a suitable value. In this case, we decided to use 12GB and set the parameter to 6144 (6144*2M=12GB). You can run:
    echo 6144 > /proc/sys/vm/nr_hugepages

    or

    sysctl -w vm.nr_hugepages=6144

    Of course, you must make sure this set across reboots too.

  2. The oracle userid needs to be able to lock a greater amount of memory. So, /etc/securities/limits.conf must be updated to increase soft and hard memlock values for oracle userid.
    oracle          soft    memlock        12582912
    oracle          hard   memlock        12582912

After setting this up, we need to make sure that SGA is indeed using HugePages. The value, (HugePages_Total- HugePages_Free)*2MB will be the approximate size of SGA (or it will equal the shared memory segment shown in the output of ipcs -ma).

cat /proc/meminfo |grep HugePages
HugePages_Total:  6144
HugePages_Free:   1655 <-- Free pages are less than total pages.
Hugepagesize:     2048 kB

Summary

Using HugePages resolved our client’s performance issues. The PageTable size also went down to a few hundred MB. If your database is running in Linux and has HugePages capability, there is no reason not to use it.

<think>我们在处理与EAL初始化相关的hugepages错误。用户遇到的具体问题是:在尝试分配2048kB和1048576kB的大页内存时,系统提示没有空闲的大页。我们需要提供解决这个问题的方案。首先,EAL(DPDK的环境抽象层)在初始化时需要预留大页内存。出现这个错误通常是因为系统没有配置足够的大页内存,或者已配置的大页内存已经被其他进程占用。解决方案通常涉及以下几个步骤:1.检查当前大页内存的配置和可用情况。2.调整大页内存的配置(包括大小和数量)。3.挂载大页文件系统。4.确保DPDK应用有足够的权限访问大页内存。具体到用户提到的两种大页大小(2048kB和1048576kB),我们可能需要分别配置不同大小的巨页。###解决方案步骤####1.检查当前大页配置使用以下命令检查当前的大页内存情况:```bashgrepHugePages_/proc/meminfo```或者```bashcat/proc/meminfo|grepHuge```另外,检查系统当前挂载的大页文件系统:```bashmount|grephuge```####2.配置大页内存编辑`/etc/sysctl.conf`文件,设置大页内存的数量。例如,我们可能需要为2048kB(2MB)的大页和1048576kB(1GB)的大页分别配置。对于2MB的大页(假设需要1024个页面,即2GB内存):```bashecho"vm.nr_hugepages=1024"|sudotee-a/etc/sysctl.conf```对于1GB的大页(假设需要4个页面,即4GB内存),我们需要在GRUB中配置。因为1GB的大页必须在启动时预留。编辑`/etc/default/grub`文件,在`GRUB_CMDLINE_LINUX`行中添加:```default_hugepagesz=1Ghugepagesz=1Ghugepages=4```注意:1GB大页的配置需要重启系统才能生效。如果同时需要两种大页,可以这样配置(例如,4个1GB大页和1024个2MB大页):```GRUB_CMDLINE_LINUX="...default_hugepagesz=1Ghugepagesz=1Ghugepages=4hugepagesz=2Mhugepages=1024"```更新GRUB并重启:```bashsudoupdate-grubsudoreboot```####3.挂载大页文件系统确保大页文件系统被挂载。通常,DPDK使用`/dev/hugepages`目录。如果该目录未被挂载,可以使用以下命令挂载:```bashsudomkdir-p/dev/hugepagessudomount-thugetlbfsnodev/dev/hugepages```为了让系统启动时自动挂载,可以将以下内容添加到`/etc/fstab`文件中:```nodev/dev/hugepageshugetlbfsdefaults00```如果需要分别挂载不同大小的大页(例如,同时有2MB和1GB的大页),可以挂载两个不同的目录。例如,将1GB的大页挂载到`/dev/hugepages-1GB`:```bashsudomkdir-p/dev/hugepages-1GBsudomount-thugetlbfs-opagesize=1GBnodev/dev/hugepages-1GB```并在`/etc/fstab`中添加:```nodev/dev/hugepages-1GBhugetlbfspagesize=1GB00```####4.检查DPDK应用使用的目录确保DPDK应用使用的大页目录与实际挂载的目录一致。在运行DPDK应用时,可以通过EAL参数指定:```bash--huge-dir=/dev/hugepages#使用默认的大页目录```如果使用了不同大小的多个挂载点,需要确保DPDK应用能够访问到正确的大页。####5.权限问题确保运行DPDK应用的用户有权访问大页目录。可以更改目录权限,或者使用`chown`改变所有者。####6.释放被占用的大页如果之前有其他DPDK应用运行后没有正确释放大页,可能需要手动清除。首先停止所有使用大页的应用,然后执行:```bashecho0>/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepagesecho0>/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages```然后再重新设置所需的大页数量。或者,使用以下命令清除所有大页(注意:这会导致使用大页的应用崩溃):```bashsudorm-f/dev/hugepages/*```###验证完成上述配置后,重启系统(如果修改了GRUB),然后检查大页的可用性:```bashcat/proc/meminfo|grepHugePages_```应该看到配置的HugePages_Free大于0。###常见错误原因-配置的大页数量不足:增加`vm.nr_hugepages`或GRUB中设置的数量。-没有挂载大页文件系统:挂载到正确目录。-1GB大页需要连续物理内存:如果系统运行时间较长,内存碎片化可能导致1GB大页分配失败。这时需要在系统启动时预留。-其他程序占用:检查并释放。###总结通过以上步骤,通常可以解决EAL初始化时大页内存分配失败的问题。请根据实际情况调整大页的数量和大小配置。§§相关问题§§1.如何检查系统中哪些进程正在使用大页内存?2.DPDK应用中如何配置使用不同大小的巨页?3.除了大页内存,EAL初始化还可能遇到哪些常见问题?</think> 针对 EAL 初始化中出现的 `no free hugepages` 错误(涉及 2048kB 和 1048576kB 大页),以下是系统级解决方案,包含验证和配置步骤: --- ### **错误原因分析** 1. **大页未配置**:系统未分配或分配数量不足 2. **挂载点问题**:`hugetlbfs` 未正确挂载或权限不足 3. **内存碎片**:连续物理内存不足(尤其 1GB 大页) 4. **残留占用**:之前进程未释放大页资源 --- ### **解决方案步骤** #### **1. 验证大页状态** ```bash # 检查当前大页配置 grep Huge /proc/meminfo # 输出示例: HugePages_Total: 0 # 未配置大页 HugePages_Free: 0 Hugepagesize: 1048576 kB # 或 2048 kB ``` #### **2. 配置大页内存** ```diff # 临时配置 (重启失效) sudo sysctl -w vm.nr_hugepages=1024 # 分配1024个默认大小的大页 ``` ```bash # 永久配置 (在/etc/sysctl.conf末尾添加) echo "vm.nr_hugepages=1024" | sudo tee -a /etc/sysctl.conf sudo sysctl -p ``` > **特殊处理 1GB 大页(1048576kB)**: 需在 GRUB 中配置(分配 4 个 1GB 大页): ```bash sudo vim /etc/default/grub # 在 GRUB_CMDLINE_LINUX 添加: default_hugepagesz=1G hugepagesz=1G hugepages=4 ``` ```bash sudo update-grub && sudo reboot ``` #### **3. 挂载 hugetlbfs** ```bash # 创建挂载点 sudo mkdir -p /dev/hugepages # 永久挂载(编辑 /etc/fstab) sudo sh -c 'echo "nodev /dev/hugepages hugetlbfs defaults 0 0" >> /etc/fstab' sudo mount -a # 验证挂载 mount | grep huge # 应显示 hugetlbfs 类型 ``` #### **4. 释放残留大页** ```bash # 清空已占用大页 sudo rm -f /dev/hugepages/* sudo sysctl vm.nr_hugepages=0 sudo sysctl vm.nr_hugepages=1024 # 重新分配 ``` #### **5. EAL 启动参数修正** 在 DPDK 应用启动命令中添加大页路径: ```bash --huge-dir=/dev/hugepages # 指定大页挂载路径 --socket-mem=1024,1024 # 显式分配 NUMA 节点内存(单位 MB) ``` 示例完整命令: ```bash sudo ./yourapp -l 0-3 --socket-mem=1024 --huge-dir=/dev/hugepages ``` --- ### **关键验证点** 1. **大页可用性**: ```bash cat /proc/meminfo | grep Huge # 必须满足:HugePages_Free >= 应用需求 ``` 2. **挂载权限**: ```bash ls -ld /dev/hugepages # 权限应为 drwxrwxrwt ``` 3. **内存碎片检测**: ```bash dmesg | grep -i "huge page allocation fail" # 若存在,需增加预留内存或重启服务器 ``` --- ### **常见问题补充** - **NUMA 架构**:多插槽服务器需在各 NUMA 节点均衡分配大页 ```bash echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages echo 512 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages ``` - **透明大页冲突**:禁用 THP ```bash echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled ``` > 通过上述步骤解决后,EAL 初始化应能成功预留大页资源。若问题仍存,需结合 `dmesg` 日志排查硬件或内核兼容性问题[^DPDK-HUGEPAGES]。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值