Linux kernel crash case总结

本文总结了Linux内核开发中遇到的三种常见问题:1) Lockup,包括软锁up和硬锁up,内核提供了检测工具;2) Memory Fault,内存错误通常是由于非法地址访问导致的;3) Invalid Opcode,分析了可能的原因,如函数指针篡改。通过对异常的深入理解,有助于内核开发者定位和解决问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在内核开发过程中,除了与内核的各种数据结构、各种API打交道之外,接触频率最高的可能就是各种内核crash case了,本文主要对项目中遇到的若干问题进行一下总结,备忘。

1. lockup

linux kernel将lockup分为两类:

soft lockup:当前的cpu一直运行在内核态,使得其他进程没有机会运行;
hard lockup:当前的cpu一直运行在内核态,使得中断没有机会运行。
具体的描述可以参考内核源码树的Documentation/lockup-watchdogs.txt

在内核开发中,如果不当地使用spinlock或不当地使用具有spinlock操作的API,或者disable_irq后忘记了enable_irq等等,很容易导致lockup问题。这种现象时比较常见的,例如某些执行路径可能是位于进程上下文中,也可能位于中断异常等上下文中,一不小心就可能悲剧了。除了代码开发人员保持警惕之外,linux kernel已经自带lockup检测工具:watchdog。针对上面的两类型lockup,kernel分别使用了高精度定时器(soft lockup)与perf子系统的NMI中断(hard lockup)来检测报警。

softlock up detector
跟单片机的watchdog类似,需要定期的touch一下看门狗,否则就会bark。在kernel中,是由watchdog kernel thread负责定期touch,hrtimer中断handler负责定期check。伪代码如下所示:

/* (1). watchdog thread */
static int watchdog(void *unused)
{
    while (true) {
        /* --------- touch -------- */
        watchdog_touch_ts = current_timestamp;
        /* wait for hrtimer to wakeup */
        sleep();
        schedule();
    }
}

/* (2). hrtimer handler */
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
    /* --------- check -------- */
    if ((currnet_timestamp - watchdog_touch_ts) > softlockup_thresh )
        warn_or_panic();
    /* used for hard lockup detecting */
    hrtimer_interrupts++;
    wake_up_process(watchdog_thread);
}

原理并不复杂,注意上面的操作均为percpu,无论是变量或内核线程。考虑这么一种情况,假设某spinlock的BUG导致cpu一直loop,且该cpu的中断是打开的,则hrtimer是可以被触发的,但是watchdog内核线程可能没有机会run起来,则自然会触发上面的报警。另外,上面的hrtimer_interrupts计数是用于辅助hard lockup detecting的。

hard lockup detector
显然,出现这种现象本身就意味着中断被屏蔽了,也就不能使用上面的定时器这种方式来检测,但有一种中断是不可屏蔽的:NMI。这里选择了perf子系统的中断(为什么就是这个中断呢?个人觉得就是恰好满足了吧,因为这个中断本身还得具有定时的性质,perf看起来很符合这个需求)。原理如下:既然需要检测某cpu的中断是否一直处于disable状态,那么就检测一定时间段内该cpu是否产生中断呗?更进一步,上面的hrtimer本身也是中断,那就统计给定时间段内的hrtimer次数就可以了。伪代码如下:

/* perf nmi handler */
static void watchdog_overflow_callback(struct perf_event *event, int nmi, ...)
{
    if (hrtimer_interrupts_saved == hrtimer_interrupts)
        warn_or_panic();
    hrtimer_interrupts_saved = hrtimer_interrupts;
}

相关参数
上面提到的检测的时间间隔以及是否panic等都是可以通过sysctl来修改的,相关一些参数如下:

kernel.watchdog = 1
kernel.watchdog_thresh = 60
kernel.softlockup_panic = 0
kernel.nmi_watchdog = 1

example
下面举一个简单的softlockup的例子,在模块初始化时连续spin_lock同一个自旋锁,形成dead lock。

/*file: soft_lockup.c */
#include <linux/module.h>
#include <linux/spinlock.h>

static int __init lockup_init(void)
{
    spinlock_t lock;

    spin_lock_init(&lock);

    spin_lock(&lock);
    /* -- BUG -- */
    spin_lock(&lock);

    return 0;
}

static void lockup_exit(void)
{
    /* do nothing */
}

module_init(lockup_init);
module_exit(lockup_exit);

Makfile:

#file: Makefile
ifneq ($(KERNELRELEASE),)

obj-m   += soft_lockup.o
ccflags-y := -Wall

else

KDIR := /lib/modules/`uname -r`/build
PWD := $(shell pwd)

all:

            $(MAKE) -C $(KDIR) M=$(PWD) modules

clean:
            $(MAKE) -C $(KDIR) M=$(PWD) clean

endif

在虚拟机上sudo insmod soft_lockup.ko,进程当然会一直卡住,等待一段时间,内核会打印下面的消息,根据调用栈是很容易定位问题的。当然,实际开发过程中,一般soft lockup报警的调用栈并不是root cause,这种情况下一般需要耐心等等,如果后面能继续触发hard lockup panic,那么此时的调用栈可以好好关注一下,一般就是BUG所在。

[  348.143996] BUG: soft lockup - CPU#7 stuck for 67s! [insmod:777]
[  348.144442] Modules linked in: lockup(P+)
[  348.144442] CPU 7 
[  348.144442] Modules linked in: lockup(P+)
[  348.144442] 
[  348.144442] Pid: 777, comm: insmod Tainted: P           ---------------    2.6.32 #249 QEMU Standard PC (i440FX + PIIX, 1996)
[  348.144442] RIP: 0010:[<ffffffff815825ce>]  [<ffffffff815825ce>] _spin_lock+0x1e/0x30
[  348.144442] RSP: 0018:ffff8809ef8c3ee8  EFLAGS: 00000206
[  348.144442] RAX: 0000000000000000 RBX: ffff8809ef8c3ee8 RCX: 0000000000000000
[  348.144442] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8809ef8c3ef8
[  348.144442] RBP: ffffffff8100da8e R08: 0000000000000000 R09: 0000000000000000
[  348.144442] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  348.144442] R13: 0000000000000001 R14: ffffffff8158107e R15: ffff8809ef8c3e58
[  348.144442] FS:  00007fac98f46700(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
[  348.144442] CS:  0010 DS: 0000 ES: 
static noinline void page_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address) { #ifdef CONFIG_VMAP_STACK struct stack_info info; #endif unsigned long flags; int sig; if (user_mode(regs)) { /* * Implicit kernel access from user mode? Skip the stack * overflow and EFI special cases. */ goto oops; } #ifdef CONFIG_VMAP_STACK /* * Stack overflow? During boot, we can fault near the initial * stack in the direct map, but that's not an overflow -- check * that we're in vmalloc space to avoid this. */ if (is_vmalloc_addr((void *)address) && get_stack_guard_info((void *)address, &info)) { /* * We're likely to be running with very little stack space * left. It's plausible that we'd hit this condition but * double-fault even before we get this far, in which case * we're fine: the double-fault handler will deal with it. * * We don't want to make it all the way into the oops code * and then double-fault, though, because we're likely to * break the console driver and lose most of the stack dump. */ call_on_stack(__this_cpu_ist_top_va(DF) - sizeof(void*), handle_stack_overflow, ASM_CALL_ARG3, , [arg1] "r" (regs), [arg2] "r" (address), [arg3] "r" (&info)); unreachable(); } #endif /* * Buggy firmware could access regions which might page fault. If * this happens, EFI has a special OOPS path that will try to * avoid hanging the system. */ if (IS_ENABLED(CONFIG_EFI)) efi_crash_gracefully_on_page_fault(address); /* Only not-present faults should be handled by KFENCE. */ if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address, error_code & X86_PF_WRITE, regs)) return; oops: /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ flags = oops_begin(); show_fault_oops(regs, error_code, address); if (task_stack_end_corrupted(current)) printk(KERN_EMERG "Thread overran stack, or stack corrupted\n"); sig = SIGKILL; if (__die("Oops", regs, error_code)) sig = 0; /* Executive summary in case the body of the oops scrolled away */ printk(KERN_DEFAULT "CR2: %016lx\n", address); oops_end(flags, regs, sig); } 解释上面代码
03-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值