When a CLI process in Linux exits after a segmentation fault, the following message is typically printed to stdout: “Segmentation fault (core dumped)”. We are assuming here that the process did not register a handler for the SIGSEGV signal. I was a bit curious about who was printing the message so started to dig a bit.
My first hypothesis was that libc had a default handler for this signal. After running the application with strace, I found no sys_write system call: the application and its libraries were not printing anything.
If a process registers no handler for SIGSEGV, do_coredump function (fs/coredump.c – Linux kernel) is executed. Caught my attention that the Kernel creates a new task and launches a user mode application. This application is /usr/libexec/abrt-hook-ccpp in Fedora, and the goal is to record and report the crash. I ran strings over its binary and dynamically linked libraries (libc, libreport, libabrt, etc.) but no clues.
After some more source code grepping over potential candidates, such as systemd-coredump, my curiosity was even larger and finally opted for the hard way: monitor every single sys_write system call in the kernel.
The first place to insert a monitor code was n_tty_write function (drivers/tty/n_tty.c – Linux kernel), as every TTY write hits there. I set a breakpoint in the monitor code and it was immediately triggered. Looking at the task memory map, I realized it was bash. But then I thought: what if it’s not bash but a child process sending the message through a pipe?
A second monitor code was placed a bit higher in the call stack: __vfs_write function (fs/read_write.c – Linux kernel). A breakpoint there was hit and it was bash again. Downloaded the bash source and could verify it.
Here it’s the monitor:
C
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
unsigned char* kBuf = NULL; if (count > 0) { kBuf = (unsigned char*)kmalloc(count + 1, GFP_KERNEL); if (kBuf != NULL) { memset(kBuf, 0, count + 1); if (copy_from_user(kBuf, p, count) == 0) { if (strstr(kBuf, "core dumped")) { asm("nop\n\tnop\n\tnop\n\tnop\n\tnop\n\tnop\n\tnop\n\t"); printk("PID: %d\n", current->pid); asm("nop\n\tnop\n\tnop\n\tnop\n\tnop\n\tnop\n\tnop\n\t"); } } kfree(kBuf); } } |
Notes: 1) nops are there to make finding the address for the breakpoint easier, and 2) memset could probably be removed if using the proper flags when calling kmalloc.
Here it’s how it looks in assembly:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
0xffffffff81271cae <__vfs_write+510>: nop 0xffffffff81271caf <__vfs_write+511>: nop 0xffffffff81271cb0 <__vfs_write+512>: nop 0xffffffff81271cb1 <__vfs_write+513>: nop 0xffffffff81271cb2 <__vfs_write+514>: nop 0xffffffff81271cb3 <__vfs_write+515>: nop 0xffffffff81271cb4 <__vfs_write+516>: nop 0xffffffff81271cb5 <__vfs_write+517>: mov %gs:0xd300,%rax 0xffffffff81271cbe <__vfs_write+526>: mov 0x900(%rax),%esi 0xffffffff81271cc4 <__vfs_write+532>: mov $0xffffffff81ca0551,%rdi 0xffffffff81271ccb <__vfs_write+539>: callq 0xffffffff811065c1 <printk> 0xffffffff81271cd0 <__vfs_write+544>: nop 0xffffffff81271cd1 <__vfs_write+545>: nop 0xffffffff81271cd2 <__vfs_write+546>: nop 0xffffffff81271cd3 <__vfs_write+547>: nop 0xffffffff81271cd4 <__vfs_write+548>: nop 0xffffffff81271cd5 <__vfs_write+549>: nop 0xffffffff81271cd6 <__vfs_write+550>: nop |
RAX register has a pointer to the current task struct.
This is how we can have a look at the task memory map and determine the file associated to the first segment:
|
1 2 |
(gdb) x/s ((struct task_struct*)$rax)->mm->mmap->vm_file->f_path->dentry->d_name->name 0xffff88007b7de278: "bash" |
结束!

当Linux进程因段错误退出时,通常会在stdout上打印'Segmentation fault (core dumped)'。作者通过调查发现,如果没有为SIGSEGV信号注册处理器,内核会调用do_coredump函数并启动一个用户模式应用记录和报告崩溃。在Fedora中,这个应用是/usr/libexec/abrt-hook-ccpp。通过深入分析内核源码和使用strace,作者发现是bash进程在打印这个消息。最后,作者通过在sys_write系统调用中插入监控代码,证实了这一点。
3715

被折叠的 条评论
为什么被折叠?



