gdb: 如何找到pthread_mutex_t死锁的owner

最新推荐文章于 2025-06-20 21:06:57 发布

三水问海

最新推荐文章于 2025-06-20 21:06:57 发布

阅读量1.1k

点赞数 24

CC 4.0 BY-SA版权

分类专栏： GDB 文章标签： linux 服务器 gdb

本文链接：https://blog.youkuaiyun.com/aspirestro/article/details/146954572

GDB 专栏收录该内容

9 篇文章

订阅专栏

GDB 调试技巧：如何找到 `pthread_mutex_t` 死锁的 Owner

在多线程编程中，死锁是一个常见且棘手的问题。当两个或多个线程互相等待对方释放锁时，程序就会陷入死锁状态，导致所有相关线程都无法继续执行。为了快速定位死锁问题，调试工具 GDB 提供了强大的功能，允许我们深入分析线程和锁的状态。本文将介绍如何使用 GDB 命令 p ((pthread_mutex_t*) &buffer_lock)->__data.__owner 找到死锁的持有者（owner），并结合实际案例演示其用法。

1. 什么是死锁？

死锁是指多个线程因争夺资源而相互等待，最终导致所有线程都无法继续执行的情况。一个典型的死锁场景如下：

线程 A 持有锁 Lock1，并尝试获取锁 Lock2。
线程 B 持有锁 Lock2，并尝试获取锁 Lock1。

由于两个线程都在等待对方释放锁，程序陷入僵局。

2. 使用 GDB 分析死锁

在调试死锁问题时，我们需要回答以下几个关键问题：

哪些锁被持有？
每个锁的持有者是谁？
线程之间的依赖关系是什么？

GDB 提供了多种命令来帮助我们分析这些信息，其中 p ((pthread_mutex_t*) &buffer_lock)->__data.__owner 是一个非常实用的命令，用于查看某个互斥锁的持有者。

3. 命令解析

命令格式

p ((pthread_mutex_t*) &buffer_lock)->__data.__owner

逐部分解析

p
这是 GDB 中的 print 命令缩写，用于打印表达式的值。
((pthread_mutex_t*) &buffer_lock)
将变量 buffer_lock 的地址强制转换为 pthread_mutex_t* 类型。假设 buffer_lock 是一个 pthread_mutex_t 类型的变量，这种转换确保我们可以访问其内部字段。
->__data.__owner
- __data 是 pthread_mutex_t 内部的一个结构体字段，通常包含锁的状态信息。
- __owner 是 __data 结构体中的一个字段，表示当前持有该锁的线程 ID。如果锁未被持有，则 __owner 的值通常为 0。

命令作用

这条命令的作用是：查看 buffer_lock 当前是否被某个线程持有，并输出持有该锁的线程 ID。

4. 实际案例分析

示例代码

以下是一个可能导致死锁的简单示例：

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

pthread_mutex_t lock1, lock2;

void* thread1_func(void* arg) {
    pthread_mutex_lock(&lock1);
    printf("Thread 1: Acquired lock1\n");
    sleep(1); // 模拟工作
    pthread_mutex_lock(&lock2); // 可能导致死锁
    printf("Thread 1: Acquired lock2\n");
    pthread_mutex_unlock(&lock2);
    pthread_mutex_unlock(&lock1);
    return NULL;
}

void* thread2_func(void* arg) {
    pthread_mutex_lock(&lock2);
    printf("Thread 2: Acquired lock2\n");
    sleep(1); // 模拟工作
    pthread_mutex_lock(&lock1); // 可能导致死锁
    printf("Thread 2: Acquired lock1\n");
    pthread_mutex_unlock(&lock1);
    pthread_mutex_unlock(&lock2);
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_mutex_init(&lock1, NULL);
    pthread_mutex_init(&lock2, NULL);

    pthread_create(&t1, NULL, thread1_func, NULL);
    pthread_create(&t2, NULL, thread2_func, NULL);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    pthread_mutex_destroy(&lock1);
    pthread_mutex_destroy(&lock2);
    return 0;
}

运行程序

编译并运行上述代码：

gcc -g -o deadlock_example deadlock_example.c -lpthread
./deadlock_example

程序可能会卡住，无法继续执行。

使用 GDB 调试

启动 GDB 并加载程序：
```
gdb ./deadlock_example
```

设置断点并运行程序：

break thread1_func
break thread2_func
run

当程序卡住时，暂停执行并检查锁状态：

thread apply all bt  # 查看所有线程的调用栈
p ((pthread_mutex_t*) &lock1)->__data.__owner  # 查看 lock1 的持有者
p ((pthread_mutex_t*) &lock2)->__data.__owner  # 查看 lock2 的持有者

分析结果

假设输出如下：

$1 = 140735218956032  # lock1 的持有者线程 ID
$2 = 140735218956352  # lock2 的持有者线程 ID

通过 info threads 命令可以进一步确认线程 ID 对应的线程：

info threads

结合调用栈信息，可以清晰地看到：

线程 1 持有 lock1 并试图获取 lock2。
线程 2 持有 lock2 并试图获取 lock1。

这就是典型的死锁场景。

5. 实战记录

~# gdb ./hello
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright © 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “aarch64-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from ./hello…
(gdb) b main
Breakpoint 1 at 0x12c4: file hello.c, line 78.
(gdb) run
Starting program: /root/hello
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/aarch64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7ff7a7a1c0 (LWP 1657)]

Thread 1 “hello” hit Breakpoint 1, main () at hello.c:78
78 pthread_create(&master, NULL, &master_thread, NULL);
(gdb) disassemble rt_print_flush_buffers
Dump of assembler code for function rt_print_flush_buffers:
0x0000007ff7fa2ea8 <+0>: stp x29, x30, [sp, #-32]!
0x0000007ff7fa2eac <+4>: mov x29, sp
0x0000007ff7fa2eb0 <+8>: str x19, [sp, #16]
0x0000007ff7fa2eb4 <+12>: adrp x19, 0x7ff7fca000 memcpy@got.plt
0x0000007ff7fa2eb8 <+16>: add x19, x19, #0xed0
0x0000007ff7fa2ebc <+20>: bl 0x7ff7f9e070 cobalt_thread_relax@plt
0x0000007ff7fa2ec0 <+24>: add x19, x19, #0x8
0x0000007ff7fa2ec4 <+28>: mov x0, x19
0x0000007ff7fa2ec8 <+32>: bl 0x7ff7f9e730 pthread_mutex_lock@plt
0x0000007ff7fa2ecc <+36>: bl 0x7ff7fa2ca0 <print_buffers>
0x0000007ff7fa2ed0 <+40>: mov x0, x19
0x0000007ff7fa2ed4 <+44>: ldr x19, [sp, #16]
0x0000007ff7fa2ed8 <+48>: ldp x29, x30, [sp], #32
0x0000007ff7fa2edc <+52>: b 0x7ff7f9e7c0 pthread_mutex_unlock@plt
End of assembler dump.
(gdb) break *0x0000007ff7fa2ecc
Breakpoint 2 at 0x7ff7fa2ecc
(gdb) break release_buffer
Breakpoint 3 at 0x7ff7fa301c
(gdb) c
Continuing.
[New Thread 0x7ff72791c0 (LWP 1658)]
[New Thread 0x7ff72581c0 (LWP 1659)]
[Switching to Thread 0x7ff72581c0 (LWP 1659)]

Thread 4 “slave” hit Breakpoint 2, 0x0000007ff7fa2ecc in rt_print_flush_buffers () from /usr/xenomai/lib/libcobalt.so.2
(gdb) p ((pthread_mutex_t*) &buffer_lock)->__data.__owner
$1 = 1659
(gdb) call pthread_cancel(0x7ff72581c0)
$2 = 0
(gdb) c
Continuing.

Thread 4 “slave” hit Breakpoint 3, 0x0000007ff7fa301c in release_buffer () from /usr/xenomai/lib/libcobalt.so.2
(gdb) p ((pthread_mutex_t*) &buffer_lock)->__data.__owner
$3 = 1659
(gdb) c
Continuing.
^C
Thread 1 “hello” received signal SIGINT, Interrupt.
[Switching to Thread 0x7ff7ff4010 (LWP 1654)]
0x0000007ff7bb6398 in sched_yield () at …/sysdeps/unix/syscall-template.S:78
78 …/sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) info thread
Id Target Id Frame

1 Thread 0x7ff7ff4010 (LWP 1654) “hello” 0x0000007ff7bb6398 in sched_yield () at …/sysdeps/unix/syscall-template.S:78
2 Thread 0x7ff7a7a1c0 (LWP 1657) “cobalt_printf” __lll_lock_wait (futex=futex@entry=0x7ff7fcaed8 <buffer_lock>, private=0)
at lowlevellock.c:52
3 Thread 0x7ff72791c0 (LWP 1658) “master” 0x0000007ff7b6d314 in __GI__IO_default_xsputn (n=, data=,
f=) at genops.c:393
4 Thread 0x7ff72581c0 (LWP 1659) “slave” (Exiting) __lll_lock_wait (futex=futex@entry=0x7ff7fcaed8 <buffer_lock>, private=0)
at lowlevellock.c:52
(gdb) thread 4
[Switching to thread 4 (Thread 0x7ff72581c0 (LWP 1659))]
#0 __lll_lock_wait (futex=futex@entry=0x7ff7fcaed8 <buffer_lock>, private=0) at lowlevellock.c:52
52 lowlevellock.c: No such file or directory.
(gdb) bt
#0 __lll_lock_wait (futex=futex@entry=0x7ff7fcaed8 <buffer_lock>, private=0) at lowlevellock.c:52
#1 0x0000007ff7f43cd8 in __GI___pthread_mutex_lock (mutex=0x7ff7fcaed8 <buffer_lock>) at pthread_mutex_lock.c:80
#2 0x0000007ff7fa3038 in release_buffer () from /usr/xenomai/lib/libcobalt.so.2
#3 0x0000007ff7f413c0 in __nptl_deallocate_tsd () at pthread_create.c:301
#4 0x0000007ff7f4156c in start_thread (arg=0x7ff7fa7430 <cobalt_thread_trampoline>) at pthread_create.c:488
#5 0x0000007ff7bcc49c in thread_start () at …/sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) p ((pthread_mutex_t*) &buffer_lock)->__data.__owner
$4 = 1659
(gdb) p ((pthread_mutex_t*) &buffer_lock)->__data.__lock
$5 = 2
(gdb) p ((pthread_mutex_t*) &buffer_lock)->__data.__count
$6 = 0
(gdb)

6. 注意事项

平台差异性
上述命令依赖于 pthread_mutex_t 的具体实现。不同平台或线程库可能有不同的字段名称或结构布局。例如，在某些系统上，__data 和 __owner 可能不存在，需要查阅对应平台的头文件（如 pthread.h）以确认字段名称。
调试符号要求
使用 GDB 调试时，必须确保编译时启用了调试符号（-g 选项）。否则，GDB 无法访问变量的内部结构。
线程安全
在调试过程中，尽量避免修改锁的状态，以免影响程序的正常运行。