Understanding a Kernel Oops!

本文深入解析Linux内核OOPS错误的工作原理、生成方式、解析方法及其实例,旨在帮助Linux内核开发者理解和调试此类错误。

http://opensourceforu.efytimes.com/2011/01/understanding-a-kernel-oops/

By Surya Prabhakar on January 1, 2011 in Coding, Developers



Understanding a kernel panic and doing the forensics to trace the bug is considered a hacker’s job. This is a complex task that requires sound knowledge of both the architecture you are working on, and the internals of the Linux kernel. Depending on type of error detected by the kernel, panics in the Linux kernel are classified as hard panics (Aiee!) and soft panics (Oops!). This article explains the workings of a Linux kernel ‘Oops’, helps to create a simple version, and then debug it. It is mainly intended for beginners getting into Linux kernel development, who need to debug the kernel. Knowledge of the Linux kernel, and C programming, is assumed.

An “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. It’s somewhat like the segfaults of user-space. An Oops dumps its message on the console; it contains the processor status and the CPU registers of when the fault occurred. The offending process that triggered this Oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an Oops has occurred, the system cannot be trusted any further.

Let’s try to generate an Oops message with sample code, and try to understand the dump.

Setting up the machine to capture an Oops

The running kernel should be compiled with CONFIG_DEBUG_INFO, andsyslogd should be running. To generate and understand an Oops message, Let’s write a  sample kernel module,oops.c:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
 
static void create_oops() {
         *( int *)0 = 0;
}
 
static int __init my_oops_init( void ) {
         printk( "oops from the module\n" );
         create_oops();
        return (0);
}
static void __exit my_oops_exit( void ) {
         printk( "Goodbye world\n" );
}
 
module_init(my_oops_init);
module_exit(my_oops_exit);

The associated Makefile for this module is as follows:

obj-m   := oops.o
KDIR    := /lib/modules/ $(shell uname -r) /build
PWD     := $(shell pwd )
SYM=$(PWD)
 
all:
         $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

Once executed, the module generates the following Oops:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
PGD 7a719067 PUD 7b2b3067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/misc/kvm/uevent
CPU 1
Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64
RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
FS:  00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0)
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
  0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
  ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
RIP  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP <ffff88007ad4bf08>
CR2: 0000000000000000

Understanding the Oops dump

Let’s have a closer look at the above dump, to understand some of the important bits of information.

BUG: unable to handle kernel NULL pointer dereference at (null)

The first line indicates a pointer with a NULL value.

IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]

IP is the instruction pointer.

Oops: 0002 [#1] SMP

This is the error code value in hex. Each bit has a significance of its own:

  • bit 0 == 0 means no page found, 1 means a protection fault
  • bit 1 == 0 means read, 1 means write
  • bit 2 == 0 means kernel, 1 means user-mode
  • [#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a cascading effect of the first one.
CPU 1

This denotes on which CPU the error occurred.

Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64

The Tainted flag points to P here. Each flag has its own meaning. A few other flags, and their meanings, picked up fromkernel/panic.c:

  • P — Proprietary module has been loaded.
  • F — Module has been forcibly loaded.
  • S — SMP with a CPU not designed for SMP.
  • R — User forced a module unload.
  • M — System experienced a machine check exception.
  • B — System has hit bad_page.
  • U — Userspace-defined naughtiness.
  • A — ACPI table overridden.
  • W — Taint on warning.
RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]

RIP is the CPU register containing the address of the instruction that is getting executed.0010 comes from the code segment register. my_oops_init+0x12/0x21 is the<symbol> + the offset/length.

RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010

This is a dump of the contents of some of the CPU registers.

Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
  0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
  ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000

The above is the stack trace.

Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b

The above is the call trace — the list of functions being called just before the Oops occurred.

Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00

The Code is a hex-dump of the section of machine code that was being run at the time the Oops occurred.

Debugging an Oops dump

The first step is to load the offending module into the GDB debugger, as follows:

[root@DELL-RnD-India oops]# gdb oops.ko
GNU gdb (GDB) Fedora (7.1-18.fc13)
Reading symbols from /code/oops/oops.ko...done.
(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at
     .text_addr = 0xffffffffa03e1000

Next, add the symbol file to the debugger. The add-symbol-file command’s first argument isoops.o and the second argument is the address of the text section of the module. You can obtain this address from/sys/module/oops/sections/.init.text (where oops is the module name):

(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at
     .text_addr = 0xffffffffa03e1000
(y or n) y
Reading symbols from /code/oops/oops.o...done.

From the RIP instruction line, we can get the name of the offending function, and disassemble it.

(gdb) disassemble my_oops_init
Dump of assembler code for function my_oops_init:
    0x0000000000000038 <+0>:    push   %rbp
    0x0000000000000039 <+1>:    mov    $0x0,%rdi
    0x0000000000000040 <+8>:    xor    %eax,%eax
    0x0000000000000042 <+10>:    mov    %rsp,%rbp
    0x0000000000000045 <+13>:    callq  0x4a <my_oops_init+18>
    0x000000000000004a <+18>:    movl   $0x0,0x0
    0x0000000000000055 <+29>:    xor    %eax,%eax
    0x0000000000000057 <+31>:    leaveq
    0x0000000000000058 <+32>:    retq
End of assembler dump.

Now, to pin point the actual line of offending code, we add the starting address and the offset. The offset is available in the sameRIP instruction line. In our case, we are adding 0x0000000000000038 + 0x012 =  0x000000000000004a. This points to themovl instruction.

(gdb) list *0x000000000000004a
0x4a is in my_oops_init (/code/oops/oops.c:6).
1    #include <linux/kernel.h>
2    #include <linux/module.h>
3    #include <linux/init.h>
4    
5    static void create_oops() {
6        *(int *)0 = 0;
7    }

This gives the code of the offending function.

References

The kerneloops.org website can be used to pick up a lot of Oops messages to debug. The Linux kernel documentation directory has information about Oops —kernel/Documentation/oops-tracing.txt. This, and numerous other online resources, were used while creating this article.

Related Posts:


我的服务端下发的指令为 ”echo 2 > /sys/class/net/xge1/queues/rx-0/rps_cpus echo 3 > /proc/irq/29/smp_affinity_list ifconfig xge1 192.85.10.15 iperf3 -s -p 2000 -i 10 -A 4 & iperf3 -s -p 2001 -i 10 -A 5 &“ 客户端下发的指令是 ”ifconfig xge1 192.85.10.25 iperf3 -c 192.85.10.15 -p 2000 -t 100 -i 10 -A 4 -T s1 -Z & iperf3 -c 192.85.10.15 -p 2001 -t 100 -i 10 -A 5 -T s2 -Z &“ 这样就会报错 ”[ 5] local 192.85.10.15 port 2000 connected to 192.85.10.25 port 32930 Accepted connection from 192.85.10.25, port 55622 [ 5] local 192.85.10.15 port 2001 connected to 192.85.10.25 port 55638 [ 390.057897] watchdog 3622 [ 390.161970] watchdog 3623 [ 390.265906] watchdog 3624 [ 390.369908] watchdog 3625 [ 390.375713] Unable to handle kernel paging request at virtual address 0002001400000000 [ 390.386302] Mem abort info: [ 390.389109] ESR = 0x86000004 [ 390.392176] EC = 0x21: IABT (current EL), IL = 32 bits [ 390.397508] SET = 0, FnV = 0 [ 390.400573] EA = 0, S1PTW = 0 [ 390.403727] [0002001400000000] address between user and kernel address ranges [ 390.410893] Internal error: Oops: 0000000086000004 [#1] SMP [ 390.416489] Modules linked in: hi309a_eth_drv(O) hi309a_mag(O) hi309a_ppe(O) hi309a_poe(O) hi309a_urm(O) hns_mdio hi309a_watchdog(O) hi309a_djtag(O) ahci_platform libahci_platform libahci libata usb_storage sd_mod scsi_mod xhci_plat_hcd hi309a_usb_hisi(O) xhci_hcd usbcore hi309a_lbc(O) hi309a_tsensor(O) hi309a_log(O) hi309a_sata(O) hi309a_pcie(O) hi309a_serdes(O) hi309a_sfc(O) hi309a_pmbus(O) hi309a_i2c(O) hi309a_spi(O) hi309a_gpio(O) hi309a_pinctrl(O) hi309a_subctrl(O) ksecurec(O) [ 390.459397] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G O 5.10.0 #1 [ 390.466739] Hardware name: Hisilicon PhosphorHi1230 EMU (DT) [ 390.472424] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--) [ 390.473935] watchdog 3626 [ 390.478458] pc : 0x2001400000000 [ 390.478465] lr : 0x1400000000 [ 390.481521] Insufficient stack space to handle exception! [ 390.481523] ESR: 0x96000047 -- DABT (current EL) [ 390.481524] FAR: 0xffff800012090120 [ 390.481527] Task stack: [0xffff800012090000..0xffff800012094000] [ 390.481529] IRQ stack: [0xffff800011d50000..0xffff800011d54000] [ 390.481531] Overflow stack: [0xffff0005fef5d310..0xffff0005fef5e310] [ 390.481533] CPU: 3 PID: 25 Comm: ksoftirqd/3 Tainted: G O 5.10.0 #1 [ 390.481535] Hardware name: Hisilicon PhosphorHi1230 EMU (DT) [ 390.481537] pstate: 004003c9 (nzcv DAIF +PAN -UAO -TCO BTYPE=--) [ 390.481538] pc : el1_sync+0x0/0x100 [ 390.481540] lr : __switch_to+0x80/0xe0 [ 390.481541] sp : ffff800012090120 [ 390.481543] x29: ffff800012093d80 x28: 0000000000000000 [ 390.481547] x27: ffff800011d7bb88 x26: ffff0004c01681c0 [ 390.481550] x25: ffff8000111ae058 x24: ffff0004c01569c8 [ 390.481552] x23: ffff800011553008 x22: ffff0004c01563c0 [ 390.481555] x21: ffff0004c01563c0 x20: ffff0004c01563c0 [ 390.481558] x19: ffff0004c00fc740 x18: 0000000000000000 [ 390.481561] x17: ffff80001155d438 x16: ffff0005fefde4c0 [ 390.481564] x15: 0000000000000000 x14: 00000000000003b2 [ 390.481566] x13: 00000000000003b2 x12: 0000000000000001 [ 390.481569] x11: 0000000000000001 x10: 0000000000000aa0 [ 390.481572] x9 : ffff800012093d80 x8 : ffff0004c0156ec0 [ 390.481575] x7 : ffff0005fef69e80 x6 : 0000000000736720 [ 390.481577] x5 : 0000000000000000 x4 : 0000000000000100 [ 390.481580] x3 : 0000000000000000 x2 : 0000000000000001 [ 390.481583] x1 : ffff0004c01563c0 x0 : ffff0004c00fc740 [ 390.481586] Kernel panic - not syncing: kernel stack overflow [ 390.484330] sp : ffff800012013f40 [ 390.484334] x29: 0000000000000000 x28: 0000000000000000 [ 390.484338] x27: 0000000000000000 x26: ffff0004c00fe3c0 [ 390.484341] x25: 0000000000000000 x24: 0000000000000000 [ 390.484343] x23: ffff8000119b9344 x22: ffff80001155e8f8 [ 390.484347] x21: ffff8000119b9320 x20: 0000000000000005 [ 390.484349] x19: ffff8000119b9248 x18: 0000000000000000 [ 390.484352] x17: 0000000000000000 x16: 0000000000000000 [ 390.484355] x15: 0000000000000000 x14: 00000000000001f3 [ 390.484358] x13: 00000000000001f3 x12: 0000000000000001 [ 390.484361] x11: 0000000000000000 x10: 0000000000000aa0 [ 390.484364] x9 : 0000000000000000 x8 : ffff8000119b8358 [ 390.484367] x7 : ffff800012013e38 x6 : 000000030f461e4b [ 390.484370] x5 : 00ffffffffffffff x4 : 0000000000cccccd [ 390.484373] x3 : 4000000000000000 x2 : 0000000000003baa [ 390.484376] x1 : ffff0005fef9ac20 x0 : 00000000000000e0 [ 390.484379] Call trace: [ 390.484383] 0x2001400000000 [ 390.484392] Code: bad PC value [ 390.487373] ---[ end trace 3a9fa070f655ea5c ]--- [ 390.487386] SMP: stopping secondary CPUs [ 390.487387] Kernel Offset: disabled [ 390.487389] CPU features: 0x00c0,00240026,2a00a838 [ 390.487391] Memory Limit: none BoardType = 0x0 Welcome to BIOS Hboot1....EMU OR formal version 1.0 Boot from main area Copy Hiboot2_A from FLASH to LLC! Copy Hiboot2_A Ok Hboot1 startup log is not initialized Get Cpu freq failed. Boot from main area Welcome to BIOS SEC, 2025-02-17 14:28:00 Reset Cause: External Watchdog Reset | Power On Soft reset count is 1 init by whitelist enter 4-Byte by 4-Byte Address Instruction Table IdxCtrl 0x0, IdxChip 0x0,0xC8 0x67 0x1A 0xFF 0xC8 0x67 PageSize 0x100, SectorSize 0x10000, Size 0x4000000, ReadOpcode 0x13, ReadDummy 0x0,WriteOpcode 0x12, EraseOpcode 0xDC, Protocol 0x0, 0x0, 0x0, Timeout 0x1312D00, 0xF4240!! sfc init, controller 0x0, cs 0x0 init success. ScTsensorCtrl.Val=0x0 ScTsensorCtrl.Val=0x1 ScTsensor1Ctrl.Val=0x0 ScTsensor1Ctrl.Val=0x1 Clear SRAM.RINT = 0x8, ecc, cleared intStatus:0x0 MN Init HHA Init. BIOS boot from area 0 CPU_Cluster0 Temperature: 52 degree CPU_Cluster1 Temperature: 53 degree AVS init start No pmu detected VDDAVS pmu not init, set voltage skipped! AVS init end Press 'Ctrl+D' to rom shell in 2 seconds.. Ddr self refresh check pass UefiId=0x0, BoardId=0xFFFD, DdrType=0x4 Get channel bit map: 0x1 IsSolderDown = 0x1, ChBitMap = 0x1 InRankMirrorEn = 0x1 “ 但是当我的客户端的指令改成这样的时候 ”echo 2 > /sys/class/net/xge1/queues/rx-0/rps_cpus ifconfig xge1 192.85.10.25 iperf3 -c 192.85.10.15 -p 2000 -t 100 -i 10 -A 4 -T s1 -Z & iperf3 -c 192.85.10.15 -p 2001 -t 100 -i 10 -A 5 -T s2 -Z &“ 测试反而正常进行,请帮我进行一个详细分析,为什么会出现这样的报错,在代码层次上该怎么进行修改
09-04
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值