[转载]如何分析 kernel panic

本文深入探讨内核崩溃日志分析方法,详细解读崩溃日志中的关键信息,如错误代码、调用堆栈、寄存器状态等,并提供内核崩溃排查流程和分析技巧,帮助开发者有效定位并解决内核问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

转自 http://blog.chinaunix.net/uid-22415790-id-1774104.html,分析的很详细,谢谢原作者!

[ 875.497504] Unable to handle kernel paging request at virtual address 4042a00c
[ 875.499110] pgd = c0004000
[ 875.501797] [4042a00c] *pgd=00000000
[ 875.505364] Internal error: Oops: 5 [#1] PREEMPT
[ 875.509957] Modules linked in: dhd
[ 875.513350] CPU: 0 Not tainted (2.6.29 #21)
[ 875.517877] PC is at lock_acquire+0x60/0x74
[ 875.522024] LR is at 0x0
[ 875.524544] pc : [<c0090e10>] lr : [<00000000>] psr: 60000093
[ 875.524557] sp : c042bea0 ip : c042a000 fp : c042becc
[ 875.536004] r10: 0000e0cd r9 : 00000000 r8 : 16027ae8
[ 875.541210] r7 : 0000036b r6 : 00000000 r5 : 4042a000 r4 : 60000093
[ 875.547717] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : 00000000
[ 875.554230] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 875.561610] Control: 10c5787d Table: 1c0ec019 DAC: 00000017
[ 875.567337] Process swapper (pid: 0, stack limit = 0xc042a2e0)
[ 875.573157] Stack: (0xc042bea0 to 0xc042c000)
[ 875.577490] bea0: 00000002 00000080 00000000 c00735e4 c0036c9c c04d4b40 c00735e4 1d34ce80
[ 875.585737] bec0: c042bef4 c042bed0 c02fcef8 c0090dbc 00000002 00000000 c00735e4 c04d6010
[ 875.593984] bee0: 0000e0cd c04d4b40 c042bf3c c042bef8 c00735e4 c02fcebc c0082d34 c006d870
[ 875.602230] bf00: c042bf60 00000001 c002e518 c045e1a8 c042bf3c 0000e0cd c04d49a0 1d34ce80
[ 875.610477] bf20: 0000036b 16027ae8 00000000 0000001f c042bf94 c042bf40 c008b1f8 c00735cc
[ 875.618724] bf40: 00000000 c999a0a0 1d9e6627 0000036b 0000e0cd c00622a4 60000013 c042f318
[ 875.626970] bf60: 1d9e6627 0000036b 0000001f c002e518 c042a000 c002e518 c045e1a8 16027ae8
[ 875.635217] bf80: 510f00f2 0000001f c042bfb4 c042bf98 c002e418 c008b028 c0889a90 c045dd64
[ 875.643464] bfa0: c0029ea4 c042ec08 c042bfc4 c042bfb8 c02f76a4 c002e3f0 c042bff4 c042bfc8
[ 875.651710] bfc0: c0008bc0 c02f7644 c0008514 00000000 00000000 c0029ea4 00000000 10c5387d
[ 875.659957] bfe0: c045e25c c002a2a8 00000000 c042bff8 16008034 c000894c 00000000 00000000
[ 875.668204] Backtrace:
[ 875.670637] [<c0090db0>] (lock_acquire+0x0/0x74) from [<c02fcef8>] (_spin_lock+0x48/0x58)
[ 875.678797] r6:1d34ce80 r5:c00735e4 r4:c04d4b40
[ 875.683397] [<c02fceb0>] (_spin_lock+0x0/0x58) from [<c00735e4>] (get_next_timer_interrupt+0x24/0x224)
[ 875.692684] r5:c04d4b40 r4:0000e0cd
[ 875.696244] [<c00735c0>] (get_next_timer_interrupt+0x0/0x224) from [<c008b1f8>] (tick_nohz_stop_sched_tick+0x1dc/0x434)
[ 875.707004] [<c008b01c>] (tick_nohz_stop_sched_tick+0x0/0x434) from [<c002e418>] (cpu_idle+0x34/0x6c)
[ 875.716204] [<c002e3e4>] (cpu_idle+0x0/0x6c) from [<c02f76a4>] (rest_init+0x6c/0x80)
[ 875.723937] r7:c042ec08 r6:c0029ea4 r5:c045dd64 r4:c0889a90
[ 875.729577] [<c02f7638>] (rest_init+0x0/0x80) from [<c0008bc0>] (start_kernel+0x280/0x2e4)
[ 875.737824] [<c0008940>] (start_kernel+0x0/0x2e4) from [<16008034>] (0x16008034)
[ 875.745197] Code: e58dc008 e59bc00c e58dc00c ebfffdc6 (e595300c)
[ 875.751446] Kernel panic - not syncing: Fatal exception


the below is the analysis result:
1. 
[ 875.497504] Unable to handle kernel paging request at virtual address 4042a00c
// the whole description of the panic.

2.
Modules linked in: dhd

// it just means there is one module whose name is dhd.ko while panic happens.
// not means the panic is in this module.

3.
Process swapper (pid: 0, stack limit = 0xc042a2e0)
// pid of the panic process is 0.

4.
[ 875.517877] PC is at lock_acquire+0x60/0x74
[ 875.522024] LR is at 0x0
[ 875.524544] pc : [<c0090e10>] lr : [<00000000>] psr: 60000093
[ 875.524557] sp : c042bea0 ip : c042a000 fp : c042becc
[ 875.536004] r10: 0000e0cd r9 : 00000000 r8 : 16027ae8
[ 875.541210] r7 : 0000036b r6 : 00000000 r5 : 4042a000 r4 : 60000093
[ 875.547717] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : 0000000

// the log shows the value of ARM registers.
// 0x74 means the whole length of lock_acquire() function in disassemble format.
// 0x60 means the key point while panic happens.
// gdb the output file which include lock_acquire() function and then disassemble the function to make sure which line has issue.
// for example.
// arm-linux-gdb lockdep.o
// disassemble lock_acquire

5.
[ 875.573157] Stack: (0xc042bea0 to 0xc042c000)
[ 875.577490] bea0: 00000002 00000080 00000000 c00735e4 c0036c9c c04d4b40 c00735e4 1d34ce80
[ 875.585737] bec0: c042bef4 c042bed0 c02fcef8 c0090dbc 00000002 00000000 c00735e4 c04d6010
[ 875.593984] bee0: 0000e0cd c04d4b40 c042bf3c c042bef8 c00735e4 c02fcebc c0082d34 c006d870
[ 875.602230] bf00: c042bf60 00000001 c002e518 c045e1a8 c042bf3c 0000e0cd c04d49a0 1d34ce80
[ 875.610477] bf20: 0000036b 16027ae8 00000000 0000001f c042bf94 c042bf40 c008b1f8 c00735cc
[ 875.618724] bf40: 00000000 c999a0a0 1d9e6627 0000036b 0000e0cd c00622a4 60000013 c042f318
[ 875.626970] bf60: 1d9e6627 0000036b 0000001f c002e518 c042a000 c002e518 c045e1a8 16027ae8
[ 875.635217] bf80: 510f00f2 0000001f c042bfb4 c042bf98 c002e418 c008b028 c0889a90 c045dd64
[ 875.643464] bfa0: c0029ea4 c042ec08 c042bfc4 c042bfb8 c02f76a4 c002e3f0 c042bff4 c042bfc8
[ 875.651710] bfc0: c0008bc0 c02f7644 c0008514 00000000 00000000 c0029ea4 00000000 10c5387d
[ 875.659957] bfe0: c045e25c c002a2a8 00000000 c042bff8 16008034 c000894c 00000000 00000000
// it contains the context of the Stack from 0xc042bea0 to 0xc042c000.

6.
[ 875.668204] Backtrace:
[ 875.670637] [<c0090db0>] (lock_acquire+0x0/0x74) from [<c02fcef8>] (_spin_lock+0x48/0x58)
[ 875.678797] r6:1d34ce80 r5:c00735e4 r4:c04d4b40
[ 875.683397] [<c02fceb0>] (_spin_lock+0x0/0x58) from [<c00735e4>] (get_next_timer_interrupt+0x24/0x224)
[ 875.692684] r5:c04d4b40 r4:0000e0cd
[ 875.696244] [<c00735c0>] (get_next_timer_interrupt+0x0/0x224) from [<c008b1f8>] (tick_nohz_stop_sched_tick+0x1dc/0x434)
[ 875.707004] [<c008b01c>] (tick_nohz_stop_sched_tick+0x0/0x434) from [<c002e418>] (cpu_idle+0x34/0x6c)
[ 875.716204] [<c002e3e4>] (cpu_idle+0x0/0x6c) from [<c02f76a4>] (rest_init+0x6c/0x80)
[ 875.723937] r7:c042ec08 r6:c0029ea4 r5:c045dd64 r4:c0889a90
[ 875.729577] [<c02f7638>] (rest_init+0x0/0x80) from [<c0008bc0>] (start_kernel+0x280/0x2e4)
[ 875.737824] [<c0008940>] (start_kernel+0x0/0x2e4) from [<16008034>] (0x16008034)
// it means function call stack. from the above log, the call stack is:
// start_kernel() -> rest_init() -> cpu_idle() -> tick_nohz_stop_sched_tick() -> get_next_timer_interrupt() -> _spin_lock() -> lock_acquire().

7.
[ 875.745197] Code: e58dc008 e59bc00c e58dc00c ebfffdc6 (e595300c)
// it means the latest instruction.
// e58dc008 str ip, [sp, #8]
// e59bc00c ldr ip, [fp, #12]
// e58dc00c str ip, [sp, #12]
// ebfffdc6 ......


正在学习分析kernel panic, 希望看过能留言指教。

ps: first time to use, not used to it.

<think>嗯,用户问的是如何分析系统内核恐慌(kernel panic),我得仔细想想该怎么回答。首先,我需要回忆一下kernel panic的基本概念,可能的原因,以及常用的分析方法。记得kernel panic是Linux系统遇到严重错误时无法继续运行的状态,这时候系统会停止工作,可能显示一些错误信息。 首先,应该从收集信息开始。用户可能需要知道如何获取panic时的日志,比如控制台输出的堆栈跟踪、寄存器信息等。这些信息通常是最直接的线索。如果没有控制台访问权限,可能需要查看系统日志文件,比如/var/log/messages或dmesg的输出。不过,有时候系统崩溃后无法记录日志,所以需要配置kdump或者netconsole来捕获崩溃信息,这部分可能需要说明一下。 接下来是常见原因分析,比如硬件故障、驱动问题、内存损坏、文件系统错误或者内核bug。需要分别解释每个可能的原因,并给出如何排查的建议。例如,硬件问题可能需要运行memtest86检查内存,驱动问题可能需要更新或回滚驱动版本。 然后是分析堆栈跟踪,这部分比较复杂。需要说明如何理解堆栈中的函数调用,找到触发panic的具体模块或函数。如果有内核符号信息,可能需要使用工具如objdump或gdb来解析地址对应的代码位置。例如,使用addr2line工具将地址转换为代码行号。 另外,复现问题也很重要,特别是对于偶发性的崩溃。用户可能需要调整内核配置,启用更多调试选项,比如打开CONFIG_DEBUG_KERNEL,启用slab调试等。还有动态调试技巧,比如使用trace-cmd或ftrace来跟踪内核事件。 还有社区资源方面,建议用户查看LKML或内核bugzilla是否已有类似问题的报告,或者提交新的bug报告。如果问题在最新内核中已修复,升级内核可能是个解决办法。 最后,总结分析流程,帮助用户一步步排查。可能需要提醒用户注意备份数据,谨慎操作,特别是在修改内核参数或升级驱动时。 需要确保回答结构清晰,分步骤说明,用中文表达,并且正确使用LaTeX格式。比如,在提到日志路径时用$符号包裹,如/var/log/messages,但根据系统指令,变量或路径不需要数学公式格式,所以可能不需要用$。可能用户提到的数学表达式才需要,所以这里应该不需要。检查系统指令,确保行内数学表达式正确使用$...$,独立公式用$$...$$。但在这个回答中可能涉及较少数学公式,主要是代码或路径,所以可能需要避免不必要的LaTeX使用,除非有具体的技术术语需要强调。 可能还需要检查是否有需要纠正的地方,例如,驱动问题中的“回滚驱动”是否正确,或者是否有其他常见原因遗漏。确保每个步骤都有实际的操作建议,比如如何配置kdump,如何安装crash工具等。 总之,整理这些思路,分步骤回答,确保用户能够按照指导逐步分析和解决kernel panic的问题。</think>分析系统内核恐慌(**kernel panic**)需要系统性地排查错误根源。以下是分步骤分析指南: --- ### 一、收集关键信息 1. **控制台输出** Kernel panic 会输出关键错误信息,包括: - **触发panic的调用栈**(Call Trace) 例如: $$ \text{Call Trace:} \\ [<ffffffff810a3e50>] \text{__handle_mm_fault+0x123/0x5d0} $$ - **错误类型**(如`NULL pointer dereference`、`general protection fault`) - **寄存器状态**(RIP、RAX等) 2. **系统日志** 检查日志文件(如`/var/log/messages`、`/var/log/kern.log`): ```bash dmesg | grep -i "panic" journalctl -k --since "10 minutes ago" ``` 3. **崩溃转储** 配置 `kdump` 或 `netconsole` 捕获崩溃时的内存镜像(`vmcore`),用 `crash` 工具分析: ```bash crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/vmcore ``` --- ### 二、常见原因分析 | 原因类型 | 排查方法 | |----------------|--------------------------------------------------------------------------| | **硬件故障** | 运行 `memtest86` 检测内存;检查硬盘SMART状态;替换可疑硬件(如CPU/内存) | | **驱动问题** | 回滚或更新驱动;检查内核日志中驱动相关错误(如`Failed to load module`) | | **内核Bug** | 搜索Linux内核邮件列表(LKML)或Bugzilla确认已知问题;升级内核版本 | | **内存损坏** | 使用 `kmemleak` 检测内存泄漏;检查 `slab` 分配器状态 | | **文件系统错误**| 执行 `fsck` 修复文件系统;检查是否因断电导致元数据损坏 | --- ### 三、分析堆栈跟踪(Call Trace) 1. **定位触发函数** 从Call Trace末尾向上查找第一个内核模块或驱动函数(如`nvidia`、`ext4`)。 示例分析: $$ \text{Call Trace:} \\ [<ffffffffc0a12345>] \text{nvidia_modeset_ioctl+0x123/0x456} $$ 问题可能出在NVIDIA显卡驱动。 2. **反汇编地址** 使用 `addr2line` 将地址映射到代码行: ```bash addr2line -e /usr/lib/debug/vmlinux ffffffffc0a12345 ``` --- ### 四、复现与调试 1. **复现问题** - 最小化复现环境(关闭无关服务/驱动) - 启用内核调试选项: ```bash # 编译内核时配置 CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_INFO=y CONFIG_KASAN=y # 检测内存越界 ``` 2. **动态调试工具** - `trace-cmd`:追踪内核函数调用 ```bash trace-cmd record -p function_graph -g schedule ``` - `kgdb`:远程调试内核 --- ### 五、修复与验证 1. **临时规避** - 添加内核启动参数(如`nomodeset`禁用显卡模式设置) - 移除问题模块(`modprobe -r <module>`) 2. **长期修复** - 提交Bug报告(附Call Trace、内核版本、硬件信息) - 应用官方补丁或升级内核 --- ### 六、参考工具清单 | 工具 | 用途 | |---------------|--------------------------------| | `crash` | 分析vmcore文件 | | `objdump` | 反汇编内核函数 | | `strace`/`perf` | 追踪系统调用/性能事件 | | `kdump` | 捕获崩溃时的内存镜像 | --- 通过以上步骤,可逐步定位问题根源。若涉及硬件兼容性,建议先在稳定内核版本上验证。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值