前言
今天,来说说驱动开发中误用指针导致的错误:Unable to handle kernel NULL pointer dereference at virtual address xxxxxxxx。这个错误是我当作在做液晶驱动使用DMA的时候遇到的,在分配DMA传输用的内存的时候引用了一个空的指针导致的错误!错误打印信息如下:
[ 72.820000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 72.820000] pgd = c0004000
[ 72.820000] [00000000] *pgd=00000000
[ 72.830000] Internal error: Oops: 817 [#1] ARM
[ 72.830000] Modules linked in: disp_tft(O) sec_mmap(O)
[ 72.830000] CPU: 0 Tainted: G O (3.6.5 #55)
[ 72.830000] PC is at __memzero+0x4c/0x80
[ 72.830000] LR is at 0x0
[ 72.830000] pc : [<c0167a0c>] lr : [<00000000>] psr: 00000113
[ 72.830000] sp : c0407db4 ip : 00000000 fp : c0407dcc
[ 72.830000] r10: 00000140 r9 : 00200000 r8 : 000000f0
[ 72.830000] r7 : dfcd0000 r6 : de2c0000 r5 : 00000000 r4 : 00000001
[ 72.830000] r3 : 00000000 r2 : 00000000 r1 : ffffffd0 r0 : 00000000
[ 72.830000] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 72.830000] Control: 10c53c7d Table: 7f1cc059 DAC: 00000015
[ 72.830000] Process swapper (pid: 0, stack limit = 0xc04062e8)
[ 72.830000] Stack: (0xc0407db4 to 0xc0408000)
。。。。。。
[ 72.830000] Backtrace:
[ 72.830000] [<c0172c68>] (sg_init_table+0x0/0x38) from [<bf020324>] (lcd_flash_timer+0x144/0x340 [disp_tft])
[ 72.830000] r5:de81a0d4 r4:de240340
[ 72.830000] [<bf0201e0>] (lcd_flash_timer+0x0/0x340 [disp_tft]) from [<c0057e60>] (run_timer_softirq+0x134/0x1d4)
[ 72.830000] [<c0057d2c>] (run_timer_softirq+0x0/0x1d4) from [<c005236c>] (__do_softirq+0xa4/0x164)
[ 72.830000] r8:c046f6c0 r7:00000100 r6:c0406000 r5:c046f708 r4:00000001
[ 72.830000] [<c00522c8>] (__do_softirq+0x0/0x164) from [<c00527bc>] (irq_exit+0x48/0x94)
[ 72.830000] [<c0052774>] (irq_exit+0x0/0x94) from [<c000ed10>] (handle_IRQ+0x6c/0x8c)
[ 72.830000] [<c000eca4>] (handle_IRQ+0x0/0x8c) from [<c0008530>] (gic_handle_irq+0x40/0x58)
后来自己百度了一下,发现导致这个错误的原因主要有以下几点:
1.驱动开发人员在写驱动的时候引用了一个空的指针,导致内核的分页机制无法映射指针到一个物理地址, 处理器发出一个页错误给操作系统. 如果地址无效, 内核无法"页 入"缺失的地址; 它(常常)产生一个 oops 如果在处理器处于管理模式时发生这个情况;
2.检查驱动依赖的内核选项,可能你遗落了某个关键内核选项没选;
解决办法
大部分情况下,发生这种错误的原因都是驱动开发人员引用了一个无效的指针(或者空指针)导致的。这时候需要你通过一步步的打印调试或者通过内核打印的错误信息定位到错误地方,然后进行修改。
案例展示
为了更直观的说明如何解决这类问题,作者在此展示一个坐着当初遇到错误的例子,方便读者理解。为了便于理解,作者将代码简化:
1.比如在某个驱动程序的初始化函数中添加如下两段代码:
static int __init disp_init(void)
{
int *ptr = NULL;
*ptr = 0x123456;
...........
}
驱动加载之后就会出现如下错误:
[ 101.650000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 101.660000] pgd = de0ac000
[ 101.660000] [00000000] *pgd=7f21e831, *pte=00000000, *ppte=00000000
[ 101.660000] Internal error: Oops: 817 [#1] ARM
[ 101.660000] Modules linked in: disp_tft(O+) prn_ltp02245(O) ope_gpio_tft(O) buzz(O) sec_mmap(O) [last unloaded: disp_tft]
[ 101.660000] CPU: 0 Tainted: G O (3.6.5 #56)
[ 101.660000] PC is at disp_init+0x28/0x818 [disp_tft]
[ 101.660000] LR is at do_one_initcall+0x9c/0x16c
[ 101.660000] pc : [<bf07d028>] lr : [<c0008658>] psr: 60000013
[ 101.660000] sp : de2bfe58 ip : de2bfeb0 fp : de2bfeac
[ 101.660000] r10: bf07d000 r9 : 00000000 r8 : 00000001
[ 101.660000] r7 : debd1080 r6 : bf079fe8 r5 : 00000000 r4 : bf07a0e0
[ 101.660000] r3 : 00123456 r2 : 00000000 r1 : 00000fff r0 : 18045000
[ 101.660000] Backtrace:
[ 101.660000] [<bf07d000>] (disp_init+0x0/0x818 [disp_tft]) from [<c0008658>] (do_one_initcall+0x9c/0x16c)
[ 101.660000] r8:00000001 r7:debd1080 r6:bf079fe8 r5:00000000 r4:bf079fa0
[ 101.660000] [<c00085bc>] (do_one_initcall+0x0/0x16c) from [<c0078968>] (sys_init_module+0x1590/0x171c)
[ 101.660000] [<c00773d8>] (sys_init_module+0x0/0x171c) from [<c000de20>] (ret_fast_syscall+0x0/0x30)
[ 101.660000] Code: e59f4774 e3001fff e1a02005 e59f076c (e5853000)
由上述出错信息,我们可以定位到出现错误的函数是在disp_init()中使用了空指针。
关于如何根据Linux中的Oops信息进行驱动调试请读者仔细阅读以下博客:
http://blog.youkuaiyun.com/kangear/article/details/8217329
(声明:博客引自他人)
以上博客详细描述了如何根据内核打印的错误信息去解决在驱动开发中使用无效指针导致的错误!作者也是参考了这篇博客才恍然大悟!所以,我也是站在巨人的肩膀上去学习啊!