<Debugging Techniques> LDD3 学习笔记

本文深入探讨了内核编程特有的调试挑战,并介绍了内核调试的常见技术,如通过 printk 进行信息打印、控制消息路由、限制消息率以及如何在用户空间观察应用行为来辅助调试。此外,文章还提到了内核调试中避免过度打印、使用 printk_ratelimit 函数进行限流等实用技巧。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Debugging Techniques


内核debug的挑战:

                   Kernel programming brings its own, unique debugging challenges. Kernel code can not be easily executed under a debugger, nor can it be easily traced, because it is a set of functionalities not related to a specific process. Kernel code errors can also be exceedingly hard to reproduce and can bring down the entire system with them, thus destroying much of the evidence that could be used to track them down.


-------------------------------------------------------------------- Cut line -------------------------------------------------------------------------


Debugging by Printing

                The most common debugging technique is monitoring, which in applications programming is done by calling printf at suitable points. When you are debugging kernel code, you can accomplish the same goal with printk.


Debugging by Printing
 
                 Printk lets you classify messages according to their severity by associating different loglevels, or priorities, with the messages. You usually indicate the loglevel with a macro. For example, KERN_INFO , which we saw
prepended to some of the earlier print statements, is one of the possible loglevels of the message. The loglevel macro expands to a string, which is concatenated to the message text at compile time; that’s why there is no comma between the priority and the format string in the following examples. Here are two examples of printk commands, a debug message and a critical message:

printk(KERN_DEBUG "Here I am: %s:%i\n", __FILE__, __LINE__);
printk(KERN_CRIT "I'm trashed; giving up on %p\n", ptr);



                   There are eight possible loglevel strings, defined in the header <linux/kernel.h>; we list them in order of decreasing severity:

KERN_EMERG
Used for emergency messages, usually those that precede a crash.
KERN_ALERT
A situation requiring immediate action.
KERN_CRIT
Critical conditions, often related to serious hardware or software failures.
KERN_ERR
Used to report error conditions; device drivers often use KERN_ERR to report hard-
ware difficulties.
KERN_WARNING
Warnings about problematic situations that do not, in themselves, create seri-
ous problems with the system.
KERN_NOTICE
Situations that are normal, but still worthy of note. A number of security-related
conditions are reported at this level.
KERN_INFO
Informational messages. Many drivers print information about the hardware
they find at startup time at this level.
KERN_DEBUG
Used for debugging messages.

                  Each string (in the macro expansion) represents an integer in angle brackets. Integers range from 0 to 7,with smaller values representing higher priorities.


Redirecting Console Messages



            To select a different virtual terminal to receive messages, you can issue ioctl(TIOCLINUX) on any console device. The following program,setconsole, can be used to choose which console receives kernel messages; it must be run by the superuser and is available in the misc-progs directory.




setconsole.c
/*
 * setconsole.c -- choose a console to receive kernel messages
 *
 * Copyright (C) 1998,2000,2001 Alessandro Rubini
 * 
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 *   This program is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with this program; if not, write to the Free Software
 *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/ioctl.h>

int main(int argc, char **argv)
{
    char bytes[2] = {11,0}; /* 11 is the TIOCLINUX cmd number */

    if (argc==2) bytes[1] = atoi(argv[1]); /* the chosen console */                                                                                               
    else {
        fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1);
    }
    if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<0) {    /* use stdin */
        fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n",
                argv[0], strerror(errno));
        exit(1);
    }
    exit(0);
}


关于setconsole.c 测试的问题这里我耽搁了好久...
给出测试链接




Rate Limiting


              If you are not careful, you can find yourself generating thousands of messages with printk, overwhelming the console and, possibly, overflowing the system log file. When using a slow console device (e.g., a serial port), an excessive message rate can also slow down the system or just make it unresponsive.


               Therefore, you should be very careful about what you print, especially in production versions of drivers and especially once initialization is complete.In general, production code should never print anything during normal operation; printed out-put should be an indication of an exceptional situation requiring attention.




The kernel has provided a function that can be helpful in such cases:
int printk_ratelimit(void);


                 This function should be called before you consider printing a message that could be repeated often. If the function returns a nonzero value, go ahead and print your message, otherwise skip it. Thus, typical calls look like this:
if (printk_ratelimit( ))
printk(KERN_NOTICE "The printer is still on fire\n");

printk_ratelimit works by tracking how many messages are sent to the console. When the level of output exceeds a threshold, printk_ratelimit starts returning 0 and causing messages to be dropped.





Printing Device Numbers


                   Occasionally, when printing a message from a driver, you will want to print the device number associated with the hardware of interest. It is not particularly hard to print the major and minor numbers, but, in the interest of consistency, the kernel provides a couple of utility macros (defined in <linux/kdev_t.h>) for this purpose:、

int print_dev_t(char *buffer, dev_t dev);
char *format_dev_t(char *buffer, dev_t dev);

                  Both macros encode the device number into the given buffer ; the only difference is that print_dev_t returns the number of characters printed, while format_dev_t returns buffer ; therefore, it can be used as a parameter to a printk call directly, although one must remember that printk doesn’t flush until a trailing newline is provided. The buffer should be large enough to hold a device number; given that 64-bit device numbers are a distinct possibility in future kernel releases, the buffer should probably be at least 20 bytes long.



上述宏的实现

#define print_dev_t(buffer, dev)					\
	sprintf((buffer), "%u:%u\n", MAJOR(dev), MINOR(dev))


#define format_dev_t(buffer, dev)					\
	({								\
		sprintf(buffer, "%u:%u", MAJOR(dev), MINOR(dev));	\
		buffer;							\
	})



Using the /proc Filesystem

                    The /proc filesystem is a special, software-created filesystem that is used by the kernel to export information to the world. Each file under /proc is tied to a kernel function that generates the file’s “contents” on the fly when the file is read. We have already seen some of these files in action; /proc/modules, for example, always returns a list of the currently loaded modules.


关于proc的学习笔记




Debugging by Watching


                 Sometimes minor problems can be tracked down by watching the behavior of an application in user space. Watching programs can also help in building confidence that a driver is working correctly. For example, we were able to feel confident about scull after looking at how its read implementation reacted to read requests for different amounts of data.

               There are various ways to watch a user-space program working. You can run a debugger on it to step through its functions, add print statements, or run the program under strace. Here we’ll discuss just the last technique, which is most interesting when the real goal is examining kernel code.

                Thestrace command is a powerful tool that shows all the system calls issued by a user-space program. Not only does it show the calls, but it can also show the arguments to the calls and their return values in symbolic form. When a system call fails, both the symbolic value of the error (e.g., ENOMEM ) and the corresponding string ( Out of memory ) are displayed. strace has many command-line options; the most useful of which are -t to display the time when each call is executed, -T to display the time spent in the call, -e to limit the types of calls traced, and -o to redirect the output to a file. By default, strace prints tracing information on stderr .



下面我demo一下strace的效果




strace是一个很爽的命令,能够一步步跟踪命令触发之后发生了什么。上面这个starce跟踪了cat /proc/proc_demo
发生了什么

下面尝试跟踪最漂亮的代码——hello world!
#include <stdio.h>

int main()
{
	printf("Hello world!\n");

	return 0;
}

可以见得,一开始先调用了execve(),去执行./hello这个可执行程序
然后去open了libc.so.6这个库,最后write()把“hello world”字符串写入到标准输出








Debuggers and Related Tools


                Many readers may be wondering why the kernel does not have any more advanced debugging features built into it. The answer, quite simply, is that Linus does not believe in interactive debuggers. He fears that they lead to poor fixes, those which patch up symptoms rather than addressing the real cause of problems. Thus, no
built-in debuggers.

                   Linus 不希望有内置的调试器,不信任交互式的调试器,于是,kernel driver 对于单步调试,就别想了




关于kernel debug的技巧笔记终于~告一段落了



“山河百战归民主” ——徐悲鸿


### Windows Kernel-Power Event ID 41 BugcheckCode 10 原因及解决方法 Windows Kernel-Power 事件 ID 41 表示系统意外重启或关机,通常与硬件故障、电源问题或驱动程序错误有关。当 `BugcheckCode` 的值为 10 时,表明系统遇到了特定的蓝屏错误(STOP 错误),具体为 `CRITICAL_OBJECT_TERMINATION`[^1]。 #### 错误原因分析 `CRITICAL_OBJECT_TERMINATION` 表示一个关键的系统进程或线程被终止,导致系统无法继续运行。可能的原因包括: - 系统文件损坏或丢失。 - 驱动程序冲突或不兼容。 - 硬件故障,例如内存、硬盘或其他关键组件的问题。 - 恶意软件或病毒破坏了系统文件或关键进程。 此外,事件 ID 41 的记录中可能包含更多详细信息,例如触发错误的时间戳、相关任务类别(如 `Kernel-Power`)、以及关键字等。这些信息可以通过事件查看器中的 `Microsoft-Windows-Kernel-Power` 提供者获取[^2]。 #### 解决方法 以下是针对该问题的几种常见解决方案: 1. **检查系统文件完整性** 使用 `sfc /scannow` 命令扫描并修复可能损坏的系统文件。此命令会验证所有受保护的系统文件,并替换任何损坏的版本。 ```cmd sfc /scannow ``` 2. **更新或回滚驱动程序** 如果最近安装了新的驱动程序,尝试卸载或回滚到之前的版本。同时,确保所有硬件设备的驱动程序均为最新版本。 3. **运行内存诊断工具** 内存故障可能导致系统崩溃。可以使用 Windows 内置的内存诊断工具进行测试。 ```cmd mdsched.exe ``` 4. **检查硬件健康状态** - 确保电源供应稳定且符合系统要求。 - 检查硬盘是否有坏扇区或 SMART 错误。可以使用以下命令检查磁盘状态: ```cmd chkdsk C: /f /r ``` 5. **启用调试模式捕获更多信息** 如果问题频繁发生,可以启用内核调试模式以捕获详细的崩溃日志。通过 Debugging Tools for Windows 工具(如 WinDbg)分析转储文件,进一步定位根本原因[^1]。 6. **扫描恶意软件** 使用可靠的杀毒软件进行全面扫描,确保系统未被恶意软件感染。 #### 示例代码:启用内核调试模式 以下 PowerShell 脚本可用于配置系统以生成完整的内存转储文件: ```powershell # 启用完整内存转储 Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl" -Name CrashDumpEnabled -Value 1 Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl" -Name DumpFile -Value "%SystemRoot%\Memory.dmp" ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值