- 打开内核的SLUB DEBUG选项
+CONFIG_SLUB_DEBUG=y
+CONFIG_SLUB_DEBUG_ON=y
- 观察slabinfo
cat /proc/slabinfo
启动后记录下slabinfo。运行一段时间,再观察slabinfo。
找到增长比较大的slab。
- 打开slab trace
echo 1 > /sys/kernel/slab/<leaking_slab>/trace
打开以后slab trace会向console打印。
如果console是串口的话很有可能把系统打的无响应。最好写一个脚本。运行一段时间后关闭slab
echo 1 > /sys/kernel/slab/<leaking_slab>/trace
sleep 60
echo 0 > /sys/kernel/slab/<leaking_slab>/trace
4. 分析
打印的slab trace大概张这样
[47744.480000] TRACE kmalloc-128 alloc 0x83df8300 inuse=16 fp=0x (null)
[47744.480000] Call Trace:
[47744.480000] [<8027c4b4>] dump_stack+0x8/0x34
[47744.480000] [<8027d5fc>] alloc_debug_processing+0xf8/0x17c
[47744.480000] [<8027decc>] __slab_alloc.constprop.65+0x2e0/0x350
[47744.480000] [<800df2c0>] __kmalloc+0x98/0x148
[47744.480000] [<8308ad74>] amalloc_private+0x38/0x13c [asf]
[47744.480000] [<82aba2a8>] osif_forward_mgmt_to_app+0xa0/0x280 [umac]
[47744.480000] [<82aba478>] osif_forward_mgmt_to_app+0x270/0x280 [umac]
[47744.480000]
[47744.530000] TRACE kmalloc-128 free 0x83df8300 inuse=16 fp=0x (null)
[47744.530000] Object 83df8300: 4d 61 6e 61 67 65 2e 70 72 6f 62 5f 72 65 71 20 Manage.prob_req
[47744.530000] Object 83df8310: 35 30 00 00 00 00 00 00 00 00 00 00 00 00 40 00 50…@.
[47744.530000] Object 83df8320: 00 00 ff ff ff ff ff ff 78 11 dc 0c 55 34 ff ff …x…U4…
[47744.530000] Object 83df8330: ff ff ff ff 70 ad 00 08 63 68 5f 42 38 5f 32 47 …p…ch_B8_2G
[47744.530000] Object 83df8340: 01 08 8b 96 82 84 0c 18 30 60 32 04 6c 12 24 48 …02.l.$H [47744.530000] Object 83df8350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [47744.530000] Object 83df8360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [47744.530000] Object 83df8370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [47744.530000] Call Trace: [47744.530000] [<8027c4b4>] dump_stack+0x8/0x34 [47744.530000] [<8027d81c>] free_debug_processing+0x19c/0x218 [47744.530000] [<8027d8dc>] __slab_free+0x44/0x280 [47744.530000] [<82aba324>] osif_forward_mgmt_to_app+0x11c/0x280 [umac] [47744.530000] [<82aba478>] osif_forward_mgmt_to_app+0x270/0x280 [umac] [47744.530000] [47744.650000] TRACE kmalloc-128 alloc 0x830e0b00 inuse=16 fp=0x (null) [47744.650000] Call Trace: [47744.650000] [<8027c4b4>] dump_stack+0x8/0x34 [47744.650000] [<8027d5fc>] alloc_debug_processing+0xf8/0x17c [47744.650000] [<8027decc>] __slab_alloc.constprop.65+0x2e0/0x350 [47744.650000] [<800df2c0>] __kmalloc+0x98/0x148 [47744.650000] [<8308ad74>] amalloc_private+0x38/0x13c [asf] [47744.650000] [<82aba2a8>] osif_forward_mgmt_to_app+0xa0/0x280 [umac] [47744.650000] [<82aba478>] osif_forward_mgmt_to_app+0x270/0x280 [umac] [47744.650000] [47744.700000] TRACE kmalloc-128 free 0x830e0b00 inuse=10 fp=0x830e0300 [47744.700000] Object 830e0b00: 4d 61 6e 61 67 65 2e 70 72 6f 62 5f 72 65 71 20 Manage.prob_req [47744.700000] Object 830e0b10: 38 36 00 00 00 00 00 00 00 00 00 00 00 00 40 00 86............@. [47744.700000] Object 830e0b20: 00 00 ff ff ff ff ff ff 78 11 dc 32 e2 53 ff ff ........x..2.S.. [47744.700000] Object 830e0b30: ff ff ff ff f0 8f 00 0d 58 69 61 6f 6d 69 5f 46 ........Xiaomi_F [47744.700000] Object 830e0b40: 61 6d 69 6c 79 01 08 02 04 0b 0c 12 16 18 24 03 amily.........$. [47744.700000] Object 830e0b50: 01 04 2d 1a 00 00 03 ff 00 00 00 00 00 00 00 00 ..-............. [47744.700000] Object 830e0b60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 32 04 ..............2. [47744.700000] Object 830e0b70: 30 48 60 6c 00 00 00 00 00 00 00 00 00 00 00 00 0H
l…
[47744.700000] Call Trace:
[47744.700000] [<8027c4b4>] dump_stack+0x8/0x34
[47744.700000] [<8027d81c>] free_debug_processing+0x19c/0x218
[47744.700000] [<8027d8dc>] __slab_free+0x44/0x280
[47744.700000] [<82aba324>] osif_forward_mgmt_to_app+0x11c/0x280 [umac]
[47744.700000] [<82aba478>] osif_forward_mgmt_to_app+0x270/0x280 [umac]
[47744.700000]
[47744.810000] TRACE kmalloc-128 alloc 0x830e0b00 inuse=16 fp=0x (null)
[47744.810000] Call Trace:
[47744.810000] [<8027c4b4>] dump_stack+0x8/0x34
[47744.810000] [<8027d5fc>] alloc_debug_processing+0xf8/0x17c
[47744.810000] [<8027decc>] __slab_alloc.constprop.65+0x2e0/0x350
[47744.810000] [<800dec80>] kmem_cache_alloc+0x3c/0xe4
[47744.810000] [<801c53b8>] sock_alloc_inode+0x4c/0xc4
[47744.810000] [<800f9080>] alloc_inode+0x28/0xac
[47744.810000] [<800fa328>] new_inode_pseudo+0x10/0x30
[47744.810000] [<801c6560>] sock_alloc+0x1c/0x80
[47744.810000] [<801c6b30>] __sock_create+0x8c/0x1cc
[47744.810000] [<801c6cec>] sock_create+0x38/0x44
[47744.810000] [<801c7294>] sys_socket+0x38/0x7c
[47744.810000] [<8006d8c4>] stack_done+0x20/0x40
[47744.810000]
[47744.900000] TRACE kmalloc-128 alloc 0x830e0500 inuse=16 fp=0x (null)
[47744.900000] Call Trace:
[47744.900000] [<8027c4b4>] dump_stack+0x8/0x34
[47744.900000] [<8027d5fc>] alloc_debug_processing+0xf8/0x17c
[47744.900000] [<8027decc>] __slab_alloc.constprop.65+0x2e0/0x350
[47744.900000] [<800df2c0>] __kmalloc+0x98/0x148
[47744.900000] [<8308ad74>] amalloc_private+0x38/0x13c [asf]
[47744.900000] [<82aba2a8>] osif_forward_mgmt_to_app+0xa0/0x280 [umac]
[47744.900000] [<82aba478>] osif_forward_mgmt_to_app+0x270/0x280 [umac]
[47744.900000]
[47744.950000] TRACE kmalloc-128 free 0x830e0500 inuse=11 fp=0x830e0300
[47744.950000] Object 830e0500: 4d 61 6e 61 67 65 2e 70 72 6f 62 5f 72 65 71 20 Manage.prob_req
[47744.950000] Object 830e0510: 37 39 00 00 00 00 00 00 00 00 00 00 00 00 40 00 79…@.
[47744.950000] Object 830e0520: 00 00 ff ff ff ff ff ff f0 b4 29 07 10 22 ff ff …)…"…
[47744.950000] Object 830e0530: ff ff ff ff 00 9f 00 06 4d 49 2d 4d 41 43 01 08 …MI-MAC…
[47744.950000] Object 830e0540: 02 04 0b 0c 12 16 18 24 03 01 03 2d 1a 00 00 03 …$…-…
[47744.950000] Object 830e0550: ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
[47744.950000] Object 830e0560: 00 00 00 00 00 00 00 32 04 30 48 60 6c 00 00 00 …2.0H`l…
[47744.950000] Object 830e0570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
分析起来比较困难。土法写了个脚本。将trace保存为kmalloc-t.txt
grep “TRACE kmalloc-128 alloc” kmalloc-t.txt | awk ‘{print $5}’ | sort > alloc.txt
grep “TRACE kmalloc-128 free” kmalloc-t.txt | awk ‘{print $5}’ | sort > free.txt
将alloc和free简单做一个排序。 然后通过bcompare或者vimdiff看一下同一个slab的alloc和free是否成对出现。
比较清晰的能看出来哪一个内存快没有free。
再去kmalloc-t.txt中检查一下free.txt中消失的内存块。手工分析一下是否是可疑的内存泄露点
SLAB
如果是slab的话,有两种常见方法:一是利用debug kernel的slab leak辅助功能,二是利用systemtap等工具。参见https://access.redhat.com/solutions/358933
使用kernel的DEBUG_SLAB_LEAK功能
这需要kernel编译的时候打开了”CONFIG_DEBUG_SLAB_LEAK”选项才行,默认是没打开的。
对RHEL或CentOS来说,debug kernel打开了此编译选项,可以安装名为kernel-debug-*的rpm软件包,然后重启系统并选择此debug kernel即可。
完成后/proc目录下会出现一个名为slab_allocators的文件,里面会记录类似如下的slab分配的信息,注意观察是什么代码在分配slab,有助于找到可疑的泄漏点。缺点是只记录了直接调用的函数,没有完整的backtrace:
1
2
3
4
5
6
7
8
9
10
11
12
13
buffer_head: 2555 alloc_buffer_head+0x20/0x75
mm_struct: 9 mm_alloc+0x1e/0x42
mm_struct: 20 dup_mm+0x36/0x370
vm_area_struct: 384 dup_mm+0x18f/0x370
vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
fs_cache: 8 copy_fs_struct+0x21/0x133
fs_cache: 29 copy_process+0xf38/0x10e3
files_cache: 30 alloc_files+0x1b/0xcf
signal_cache: 81 copy_process+0xbaa/0x10e3
sighand_cache: 77 copy_process+0xe65/0x10e3
anon_vma: 241 anon_vma_prepare+0xd9/0xf3
size-2048: 1 add_sect_attrs+0x5f/0x145
size-2048: 2 journal_init_revoke+0x99/0x302
使用systemtap
除了使用debug kernel之外,还有个方法就是用systemtap,对内核适当的位置植入探针,有助于找到可疑的slab分配,这需要对内核有一定的了解才行。
普通的slab cache是通过kmem_cache_alloc来分配的,可以用现成的systemtap probe vm.kmem_cache_alloc进行观测。但是在本例中不适用,因为本例中”size-4096″属于slab里的general purpose cache,是供kmalloc()使用的,所以systemtap应该针对kmalloc()进行探测,这里有一个现成的脚本 “kmalloc-top“,它的原理是对__kmalloc()下探针,记录backtraces,因为__kmalloc是实现kmalloc()的核心函数,有的代码会直接调用__kmalloc,所以探测它而不是kmalloc()才不会有遗漏。以上的脚本没有记录kmalloc的size,所以我修改了一下,加上了kmalloc size,修改过的内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#The systemtap script that instruments the kmalloc
$script="
global kmalloc_stack
probe kernel.function("__kmalloc") { kmalloc_stack[$size, backtrace()]++ }
probe timer.ms(100), end
{
foreach ([size, stack] in kmalloc_stack) {
printf("\n")
printf(" kmalloc size %d\n", size)
print_syms(stack)
printf("\n")
printf("%d\n", kmalloc_stack[size, stack])
}
delete kmalloc_stack
}
";
以root身份执行:
1
./kmalloc-top -o ‘–all-modules’ > /tmp/kmtop.out
间隔一段时间再ctrl-c退出,看到结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
…
This path seen 1021 times:
kmalloc size 4096
0xffffffff811783e0 : __kmalloc+0x0/0x230 [kernel]
0xffffffffa022401e : 0xffffffffa022401e [sisips]
0xffffffffa024d46f : 0xffffffffa024d46f [sisips]
0xffffffffa023b763 : 0xffffffffa023b763 [sisips]
0xffffffffa022abca : 0xffffffffa022abca [sisips]
0xffffffffa022d51a : 0xffffffffa022d51a [sisips]
0xffffffff81290745 : _atomic_dec_and_lock+0x55/0x80 [kernel]
0xffffffff81193611 : __fput+0x1a1/0x210 [kernel]
0xffffffff810e884e : __audit_syscall_exit+0x25e/0x290 [kernel]
0xffffffff8100b0d2 : system_call_fastpath+0x16/0x1b [kernel]
This path seen 1021 times:
kmalloc size 4096
0xffffffff811783e0 : __kmalloc+0x0/0x230 [kernel]
0xffffffffa022401e : 0xffffffffa022401e [sisips]
0xffffffffa0224a32 : 0xffffffffa0224a32 [sisips]
0xffffffffa022abac : 0xffffffffa022abac [sisips]
0xffffffffa024f5c8 : 0xffffffffa024f5c8 [sisips]
0xffffffffa022d51a : 0xffffffffa022d51a [sisips]
0xffffffff81290745 : _atomic_dec_and_lock+0x55/0x80 [kernel]
0xffffffff81193611 : __fput+0x1a1/0x210 [kernel]
0xffffffff810e884e : __audit_syscall_exit+0x25e/0x290 [kernel]
0xffffffff8100b0d2 : system_call_fastpath+0x16/0x1b [kernel]
This path seen 853 times:
kmalloc size 4096
0xffffffff811783e0 : __kmalloc+0x0/0x230 [kernel]
0xffffffffa022401e : 0xffffffffa022401e [sisips]
0xffffffffa0222c30 : 0xffffffffa0222c30 [sisips]
0xffffffffa024d46f : 0xffffffffa024d46f [sisips]
0xffffffffa024dd49 : 0xffffffffa024dd49 [sisips]
0xffffffff81178001 : s_show+0x2c1/0x330 [kernel]
0xffffffffa02240bc : 0xffffffffa02240bc [sisips]
0xffffffffa023b783 : 0xffffffffa023b783 [sisips]
0xffffffffa022abca : 0xffffffffa022abca [sisips]
0xffffffffa022d51a : 0xffffffffa022d51a [sisips]
0xffffffff81290745 : _atomic_dec_and_lock+0x55/0x80 [kernel]
0xffffffff81193611 : __fput+0x1a1/0x210 [kernel]
0xffffffff810e884e : __audit_syscall_exit+0x25e/0x290 [kernel]
0xffffffff8100b0d2 : system_call_fastpath+0x16/0x1b [kernel]
…
可以看到,大量的size-4096分配来自内核模块”sisips”,有理由对它产生怀疑。(因为这是Symantec的内核模块,系统上没有它的debuginfo,所以systemtap解析不了它的backtrace符号,只能显示出16进制的地址)。为了验证该模块是否真的导致了内存泄露,可以暂时禁用它,观察/proc/slabinfo看size-4096是否停止疯涨,如果停了,显然该模块就有问题了。
另一种方法:kmemleak
检测内核内存泄漏还有另一种方法,就是利用kmemleak工具,它并不是针对某一个slab,而是针对所有的内核内存。详见:
用KMEMLEAK检测内核内存泄漏