运行重要业务的操作系统意外宕机后,为了进一步分析宕机原因,一般都会配置Kdump获取内核崩溃时的内存镜像,分析系统在崩溃前发生了什么,定位故障原因并修复错误。
实验环境
操作系统:RHEL 6.10
安装流程
第一步安装Kdump、Crash
# yum -y install kexec-tools crash
# chkconfig kdump on
第二步配置crashkernel(可选)
crashkernel默认为auto,已满足绝大部分场景使用,当然也可以根据服务器物理内存大小去自定义内存空间。
# sed -i 's/crashkernel=auto/crashkernel=128M/' /boot/grub/grub.conf
第三步重启系统
# reboot
第四步模拟内核崩溃,操作系统恢复后确认是否生成内核转储文件
# echo c>/proc/sysrq-trigger
# ls /var/crash
第五步安装内核调试工具
分析内核转储文件,必须安装对应内核版本的调试工具,教程中的操作系统内核版本为:2.6.32-754.el6.x86_64
# rpm -ivh http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-common-x86_64-2.6.32-754.el6.x86_64.rpm
# rpm -ivh http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-2.6.32-754.el6.x86_64.rpm
第六步分析内核转储文件
具体分析流程比较复杂,一般非专业人士主要就是获取内核崩溃的类型、导致内核崩溃的进程名称。
扩展阅读展开隐藏
KERNEL:调试用内核的路径与版本信息
DUMPFILE:正在分析的内核转储文件
CPUS:本机CPU数目
DATE:内核崩溃发生时间
UPTIME:内核已正常运行时间
LOAD AVERAGE:内核崩溃时系统负载
TASKS:内核崩溃时系统运行任务数
NODENAME:内核崩溃的主机名
RELEASE:内核的发布版本
VERSION:内核的其他版本信息
MACHINE: CPU架构与主频信息
MEMORY:内核崩溃时的系统内存大小
PANIC:内核崩溃的类型(SysRq、Oops、Panic)
PID:导致内核崩溃的进程号
COMMAND:导致内核崩溃的进程名称
TASK:导致内核崩溃的进程访问内存地址
CPU:导致内核崩溃的进程占用 CPU数目
STATE:导致内核崩溃的进程的运行状态
# crash /usr/lib/debug/lib/modules/2.6.32-754.el6.x86_64/vmlinux /var/crash/127.0.0.1-2020-04-04-22\:14\:57/vmcore
crash 7.1.0-8.el6
Copyright (C) 2002-2014 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel version inconsistency between vmlinux and dumpfile
KERNEL: /usr/lib/debug/lib/modules/2.6.32-754.el6.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2020-04-04-22:14:57/vmcore [PARTIAL DUMP]
CPUS: 1
DATE: Sat Apr 4 22:14:53 2020
UPTIME: 00:01:01
LOAD AVERAGE: 0.09, 0.04, 0.01
TASKS: 244
NODENAME: wanghualang
RELEASE: 2.6.32-754.el6.x86_64
VERSION: #1 SMP Thu May 24 18:18:25 EDT 2018
MACHINE: x86_64 (2807 Mhz)
MEMORY: 4 GB
PANIC: "SysRq : Trigger a crash"
PID: 2917
COMMAND: "bash"
TASK: ffff880138056ab0 [THREAD_INFO: ffff88013be1c000]
CPU: 0
STATE: TASK_RUNNING (SYSRQ)
crash>