D状态检测
其核心思想为创建一个内核监测进程循环监测处于D状态的每一个进程(任务)。
内核配置:CONFIG_DETECT_HUNG_TASK
Kernel hacking --->
[*] Detect Hung Tasks
(120) Default timeout for hung task detection (in seconds) (NEW)
[ ] Panic (Reboot) On Hung Tasks (NEW)
进程进入D状态时间超过120秒后打印
INFO: task sync:16015 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sync D c0512378 0 16015 1807 0x00000000
[<c0512378>] (__schedule+0x1d0/0x414) from [<c0512754>] (io_schedule+0x64/0x8c)
[<c0512754>] (io_schedule+0x64/0x8c) from [<c00759d4>] (sleep_on_page+0x8/0x10)
[<c00759d4>] (sleep_on_page+0x8/0x10) from [<c0511388>] (__wait_on_bit+0x78/0xb0)
[<c0511388>] (__wait_on_bit+0x78/0xb0) from [<c0075804>] (wait_on_page_bit+0xb4/0xbc)
[<c0075804>] (wait_on_page_bit+0xb4/0xbc) from [<c0075930>] (filemap_fdatawait_range+0xd4/0x130)
[<c0075930>] (filemap_fdatawait_range+0xd4/0x130) from [<c00759c4>] (filemap_fdatawait+0x38/0x40)
[<c00759c4>] (filemap_fdatawait+0x38/0x40) from [<c00c0a2c>] (sync_inodes_sb+0x108/0x13c)
[<c00c0a2c>] (sync_inodes_sb+0x108/0x13c) from [<c00a4090>] (iterate_supers+0xa4/0xec)
[<c00a4090>] (iterate_supers+0xa4/0xec) from [<c00c4694>] (sys_sync+0x34/0x9c)
[<c00c4694>] (sys_sync+0x34/0x9c) from [<c0012e40>] (ret_fast_syscall+0x0/0x30)
关闭打印:echo 0 > /proc/sys/kernel/hung_task_timeout_secs
也可手动检测,top或者ps查看进程状态,然后使用命令cat /proc/pid/status查看状态:State: D (disk sleep),查看堆栈信息:cat /proc/pid/stack
R状态检测
Kernel hacking --->
-*- Kernel debugging
[*] Detect Hard and Soft Lockups
[ ] Panic (Reboot) On Soft Lockups
CONFIG_LOCKUP_DETECTOR=y
暂没有复现出R状态的卡住状态情况。
扩展
CONFIG_DEBUG_SPINLOCK=y 检测spinlock的未初始化使用等问题。配合NMI watchdog使用,能发现spinlock死锁。
CONFIG_DEBUG_MUTEXES=y 检测并报告mutex错误
在线查看Linux内核源码:Linux source code (v5.15.1) - Bootlin