今天在启动 QEMU箱系统的时候,起来就 crash了. log如下:
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
pgd = eeb98000
[0000000c] *pgd=8e54d831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in: lzo_compress(+)
CPU: 1 PID: 597 Comm: modprobe Not tainted 3.10.28 #22
task: ee999400 ti: ee470000 task.ti: ee470000
PC is at load_module+0x1724/0x1fa4
LR is at load_module+0x1710/0x1fa4
pc : [<c0074ce4>] lr : [<c0074cd0>] psr: a00e0013
sp : ee471ea8 ip : bf0009a8 fp : bf00089c
r10: c03c5628 r9 : bf000854 r8 : 00000000
r7 : ee47fb28 r6 : fffffff8 r5 : bf000860 r4 : ee471f58
r3 : ee999400 r2 : ee471ea0 r1 : ee44fa30 r0 : c057d520
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 8eb9806a DAC: 00000015
Process modprobe (pid: 597, stack limit = 0xee470238)
Stack: (0xee471ea8 to 0xee472000)
1ea0: bf000860 00007fff c0070ef4 00000000 00000000 f0251000
1ec0: 00000000 ee470028 c0572434 000dc0eb bf0009a8 ee471ef4 c0043f20 00000010
1ee0: ee8bcbf8 00000002 000000d2 f0251000 ee4e1d00 00000000 00000000 bf000828
1f00: 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1f20: 00000000 00000000 00000000 00000000 000dc0eb 00009d7a b6de8008 000dc0eb
1f40: 00000080 c000dd28 ee470000 00000000 000e81b0 c0075640 f0251000 00009d7a
1f60: f0258884 f0258741 f025ac5c 000009a4 00000a54 00000000 00000000 00000000
1f80: 0000001e 0000001f 0000000a 00000000 00000007 00000000 000f01fa b6de8008
1fa0: 00000003 c000db80 000f01fa b6de8008 b6de8008 00009d7a 000dc0eb 00000000
1fc0: 000f01fa b6de8008 00000003 00000080 000f0268 bef1bf56 000f01fa 000e81b0
1fe0: b6ec5140 bef1b930 00025f6c b6ec5150 600e0010 b6de8008 73206574 696c6d79
[<c0074ce4>] (load_module+0x1724/0x1fa4) from [<c0075640>] (SyS_init_module+0xdc/0xf0)
[<c0075640>] (SyS_init_module+0xdc/0xf0) from [<c000db80>] (ret_fast_syscall+0x0/0x30)
Code: e59dc028 e15c0006 e2466008 0a000009 (e5963014)
---[ end trace a1e68de68539cdc2 ]---
Segmentation fault
根据 PC指针的值可以确定,发生问题的代码:
(gdb) list *(load_module+0x1724)
0xc0074ce4 is in load_module (kernel/module.c:1546).
1541 struct module_use *use;
1542 int nowarn;
1543
1544 mutex_lock(&module_mutex);
1545 list_for_each_entry(use, &mod->target_list, target_list) {
1546 nowarn = sysfs_create_link(use->target->holders_dir,
1547 &mod->mkobj.kobj, mod->name);
1548 }
1549 mutex_unlock(&module_mutex);
1550 #endif
(gdb)
这个问题可以复现,所以可以在这个地方设置断点,然后重新启动系统。
到达断点之后,观察 mod->target_list:
(gdb) p mod->target_list
$7 = {next = 0x0 <__vectors_start>, prev = 0x0 <__vectors_start>}
这个list是空的。。
如果list为空,那么, use的数值将是 target_list在 结构体 struct module_use里的 offset的负值:
struct module_use {
struct list_head source_list;
struct list_head target_list;
struct module *source, *target;
};
也即是 -8 (0xffffff8).
<pre name="code" class="plain">(gdb) p use
$9 = (struct module_use *) 0xfffffff8
显然,实际情况和分析的是一致的。
(gdb) disass 0xc0074ce0,+20
Dump of assembler code from 0xc0074ce0 to 0xc0074cf4:
0xc0074ce0 <load_module+5920>: beq 0xc0074d0c <load_module+5964>
=> 0xc0074ce4 <load_module+5924>: ldr r3, [r6, #20]
0xc0074ce8 <load_module+5928>: mov r1, r11
0xc0074cec <load_module+5932>: mov r2, r5
0xc0074cf0 <load_module+5936>: ldr r0, [r3, #132] ; 0x84
(gdb) p /x $r6
$2 = 0xfffffff8
(gdb) p /x $r6+20
$3 = 0xc
这解释了最开始的那个log里的地址。