77、Kernel Synchronization - Part 2: Detecting and Resolving Deadlocks

Kernel Synchronization - Part 2: Detecting and Resolving Deadlocks

1. Introduction to lockdep

Lockdep operates on lock classes, which are “logical” locks as opposed to “physical” lock instances. For example, the kernel’s struct file has a mutex and a spinlock, each considered a lock class by lockdep. Even if there are thousands of struct file instances at runtime, lockdep tracks them as classes. For more details on lockdep’s internal design, refer to the official kernel documentation: Runtime locking correctness validator .

Before using lockdep, ensure you’ve built and are running a debug kernel with lockdep enabled. You can minimally verify it as follows:

$ uname -r
6.1.25-lock-dbg
$ grep PROVE_LOCKING /boot/config-6.1.25-lock-dbg
CONFIG_PROVE_LOCKING=y

2. Example 1: Catching a Self Deadlock Bug with lockdep

2.1 The Original Code

The original code loops over the kernel task list and prints thread details. To obtain the thread name, it directly accesses the comm member of the task structure:

// ch6/foreach/thrd_showall/thrd_showall.c
static int showthrds(void)
{
    struct task_struct *g = NULL, *t = NULL; /* 'g' : process ptr; 't': thread ptr */
    [ ... ]
    do_each_thread(g, t) { /* 'g' : process ptr; 't': thread ptr */
        get_task_struct(t);     /* take a reference to the task struct */
        task_lock(t);
        [ ... ]
        if (!g->mm) {    // kernel thread
            snprintf(tmp, TMPMAX-1, " [%16s]", t->comm);
        } else {
            snprintf(tmp, TMPMAX-1, " %16s ", t->comm);
        }
        [ ... ]

2.2 The Improved but Buggy Code

A better way to get the thread name is by using the get_task_comm() helper macro. The refactored code is as follows:

// ch13/4_lockdep/buggy_thrdshow_eg/thrd_showall_buggy.c
[ … ]
static int showthrds_buggy(void)
{
    struct task_struct *g, *t;               /* 'g' : process ptr; 't': thread ptr */
     char buf[BUFMAX], tmp[TMPMAX], tasknm[TASK_COMM_LEN];
     [ ... ]
#if LINUX_VERSION_CODE < KERNEL_VERSION(6, 6, 0)
    do_each_thread(g, t) {     /* 'g' : process ptr; 't': thread ptr */
#else
    for_each_process_thread(g, t) {   /* 'g' : process ptr; 't': thread ptr */
#endif
        get_task_struct(t);     /* take a reference to the task struct */
            task_lock(t);
            [ ... ]
            get_task_comm(tasknm, t);
            if (!g->mm)     // kernel thread
                    snprintf_lkp(tmp, sizeof(tasknm)+3, " [%16s]", tasknm);
            else
                    snprintf_lkp(tmp, sizeof(tasknm)+3, " %16s ", tasknm);
            [ ... ]

When this code is compiled and inserted into the kernel on a test system, it may cause the system to hang. By checking the kernel log, lockdep reveals a self-deadlock issue.

2.3 Root Cause Analysis

The root cause lies in the __get_task_comm() function, which is called by get_task_comm() . Its code is as follows:

// fs/exec.c
char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
{
        task_lock(tsk);
        /* Always NUL terminated and zero-padded */
        strscpy_pad(buf, tsk->comm, buf_size);
        task_unlock(tsk);
        return buf; 
}
EXPORT_SYMBOL_GPL(__get_task_comm);

The __get_task_comm() function tries to reacquire the same lock that is already held, causing a self-deadlock. The lock in question is the alloc_lock spinlock within the task structure, as shown by the following code:

// include/linux/sched/task.h */
static inline void task_lock(struct task_struct *p)
{
    spin_lock(&p->alloc_lock);
}

2.4 Interpreting lockdep’s Report

The lockdep report contains some puzzling notations. Consider the following lines:

insmod/3395 is trying to acquire lock:
ffff9307c12b5ae8 (&p->alloc_lock){+.+.}-{2:2}, at: __get_task_comm+0x28/0x60
but task is already holding lock:
ffff9307c12b5ae8 (&p->alloc_lock){+.+.}-{2:2}, at: showthrds_buggy+0x11a/0x5a6 [thrd_showall_buggy]
  • The 64-bit lightweight hash value 0xffff9307c12b5ae8 is used by lockdep to identify the lock sequence. The same hash in both lines indicates the same lock.
  • The lock is named alloc_lock , and it’s attempted to be taken twice: once by task_lock() in our module code and once by get_task_comm() .
  • {+.+.} is lockdep’s notation for the lock acquisition state. For details, refer to the kernel documentation: lockdep-design.txt .

2.5 Fixing the Self Deadlock Bug

2.5.1 Refactoring the Code

One way to fix the issue is to unlock the lock before calling get_task_comm() and then lock it again. The steps are as follows:
1. task_lock(t) acquires the alloc_lock spinlock.
2. task_unlock() unlocks the lock before calling get_task_comm() .
3. get_task_comm() safely acquires and releases the lock internally.
4. task_lock() reacquires the spinlock.

However, this solution has drawbacks:
- It introduces the possibility of a race between task_unlock() and task_lock() .
- The task_{un}lock() routines only protect some task structure members.

2.5.2 Using RCU

A better solution is to use kernel RCU lock-free synchronization. Since the code is mostly read-only, RCU provides a highly optimized solution. The refactored code uses an RCU read-side critical section, eliminating the need for write protection via a spinlock. You can find the patch in the relevant GitHub repo: using_rcu.patch .

The following mermaid flowchart shows the process of detecting and fixing the self deadlock bug:

graph TD;
    A[Original Code] --> B[Refactored Buggy Code];
    B --> C[System Hangs];
    C --> D[Check Kernel Log];
    D --> E[lockdep Detects Self-Deadlock];
    E --> F[Analyze Root Cause];
    F --> G[Refactor Code];
    F --> H[Use RCU];

3. Example 2: Catching an AB - BA (Circular) Deadlock with lockdep

3.1 The Concept of Lock Ordering

We have a demo kernel module that creates two kernel threads working with two spinlocks, lockA and lockB . When using multiple locks in a process context, lock ordering is crucial. For example, we can define a rule to first take lockA and then lockB .

The wrong way to acquire locks is as follows:
| kthread 0 on CPU #0 | kthread 1 on CPU #1 |
| — | — |
| Take lockA | Take lockB |
| | |
| (Try and) take lockB | (Try and) take lockA |
| <… spins forever : DEADLOCK … > | <… spins forever : DEADLOCK … > |

This causes the classic AB - BA deadlock.

3.2 Relevant Code

The relevant code for the kernel thread worker routine is as follows:

// ch13/4_lockdep/deadlock_eg_AB-BA/deadlock_eg_AB-BA.c
[ ... ]
/* Our kernel thread worker routine */
static int thrd_work(void *arg)
{
    [ ... ]
   if (thrd == 0) {                       /* our kthread #0 runs on CPU 0 */
        pr_info(" Thread #%ld: locking: we do:"
            " lockA --> lockB\n", thrd);
        for (i = 0; i < THRD0_ITERS; i ++) {
                /* In this thread, perform the locking per the lock ordering 'rule';
                  * first take lockA, then lockB */
                pr_info(" iteration #%d on cpu #%ld\n", i, thrd);
                spin_lock(&lockA);
                DELAY_LOOP('A', 3); 
                spin_lock(&lockB);
                DELAY_LOOP('B', 2); 
                spin_unlock(&lockB);
                spin_unlock(&lockA);
        }
   } else if (thrd == 1) {            /* our kthread #1 runs on CPU 1 */
 for (i = 0; i < THRD1_ITERS; i ++) {
    /* In this thread, if the parameter lock_ooo is 1, *violate* the
    * lock ordering 'rule'; first (attempt to) take lockB, then lockA */
    pr_info(" iteration #%d on cpu #%ld\n", i, thrd);
    if (lock_ooo == 1) {        // violate the rule, naughty boy!
            pr_info(" Thread #%ld: locking: we do: lockB --> lockA\n",thrd);
            spin_lock(&lockB);
                    DELAY_LOOP('B', 2);
                    spin_lock(&lockA);
                    DELAY_LOOP('A', 3);
                    spin_unlock(&lockA);
                    spin_unlock(&lockB);
            } else if (lock_ooo == 0) { // follow the rule, good boy!
                    pr_info(" Thread #%ld: locking: we do: lockA --> lockB\n",thrd);
                    spin_lock(&lockA);
                    DELAY_LOOP('B', 2);
                    spin_lock(&lockB);
                    DELAY_LOOP('A', 3);
                    spin_unlock(&lockB);
                    spin_unlock(&lockA);
            }
    [ ... ]

3.3 Runtime Testing

3.3.1 Good Case: No Deadlock

When the lock_ooo kernel module parameter is set to 0 (the default), the program obeys the lock ordering rule, and lockdep detects no problems:

$ ./run 
Usage: run 0|1
 0 : run normally, take locks in the right (documented) order
 1 : run abnormally, take locks in the WRONG order, causing an AB-BA deadlock
$ ./run 0
[30125.466260] deadlock_eg_AB_BA: inserted (param: lock_ooo=0)
[30125.466593] thrd_work():167: *** thread PID 17048 on cpu 0 now ***
[30125.466622]  Thread #0: locking: we do: lockA --> lockB
[30125.466652]  iteration #0 on cpu #0
[30125.466841] thrd_work():167: *** thread PID 17049 on cpu 1 now ***
[30125.466874]  iteration #0 on cpu #1
[30125.466878]  Thread #1: locking: we do: lockA --> lockB
[30125.466950] BBAAA
[30125.467137] deadlock_eg_AB_BA: Our kernel thread #1 exiting now...
[30125.467138] AAABB
[30125.467287] deadlock_eg_AB_BA: Our kernel thread #0 exiting now…
3.3.2 Buggy Case: Circular Deadlock

When the lock_ooo parameter is set to 1, the system may lock up due to the violation of the lock ordering rule. By checking the kernel log, lockdep detects and reports the AB - BA circular deadlock.

3.4 The .cold Compiler Attribute

Some functions, like thrd_work , are suffixed with .cold . This is a compiler attribute indicating that the function is unlikely to be executed. Cold functions are placed in a separate linker section to improve the code locality of non - cold sections, which is an optimization technique.

3.5 Analyzing the Circular Deadlock

When the lock_ooo parameter is set to 1, the kthread 1 violates the lock ordering rule. This leads to a situation where kthread 0 holds lockA and tries to acquire lockB , while kthread 1 holds lockB and tries to acquire lockA . As a result, both threads end up spinning indefinitely, causing a deadlock.

The following mermaid flowchart illustrates the process of a circular deadlock:

graph TD;
    A[kthread 0 on CPU #0] --> B[Take lockA];
    C[kthread 1 on CPU #1] --> D[Take lockB];
    B --> E[Perform Work];
    D --> F[Perform Work];
    E --> G[Try to take lockB];
    F --> H[Try to take lockA];
    G --> I[Deadlock];
    H --> I;

3.6 Preventing Circular Deadlocks

To prevent circular deadlocks, it’s essential to always follow a consistent lock ordering rule. In our example, we can ensure that both kthread 0 and kthread 1 always acquire the locks in the same order, either lockA then lockB or lockB then lockA .

The following table summarizes the correct and incorrect lock acquisition orders:
| Lock Acquisition Order | Result |
| — | — |
| kthread 0: lockA -> lockB
kthread 1: lockA -> lockB | No Deadlock |
| kthread 0: lockA -> lockB
kthread 1: lockB -> lockA | Circular Deadlock |

4. Conclusion

In this blog, we’ve explored how to use lockdep to detect and resolve different types of deadlocks in a Linux kernel environment. We’ve covered two main examples: self deadlocks and circular (AB - BA) deadlocks.

4.1 Key Takeaways

  • lockdep Basics : lockdep operates on lock classes and can help detect various deadlock issues by analyzing lock acquisition and release patterns.
  • Self Deadlock : A self deadlock occurs when a thread tries to reacquire a lock that it already holds. We can fix this issue by either refactoring the code to unlock the lock before calling a function that tries to acquire it again or by using RCU lock - free synchronization.
  • Circular Deadlock : Circular deadlocks happen when multiple threads acquire locks in a circular and inconsistent order. To prevent this, we must follow a consistent lock ordering rule.

4.2 Best Practices

  • Lock Ordering : Always define and follow a clear lock ordering rule when using multiple locks in a process context.
  • Use RCU for Read - Only Operations : For read - mostly code, consider using RCU lock - free synchronization to avoid the overhead of traditional locks and potential deadlock issues.
  • Analyze lockdep Reports : When lockdep detects a deadlock, carefully analyze its reports to understand the root cause and take appropriate corrective actions.

By following these best practices and using tools like lockdep , we can write more robust and reliable kernel code with fewer deadlock issues.

The following list summarizes the steps to handle deadlocks in a kernel environment:
1. Enable lockdep in a debug kernel.
2. When a system hangs or shows abnormal behavior, check the kernel log for lockdep reports.
3. Analyze the lockdep reports to identify the type of deadlock (self or circular).
4. For self deadlocks, refactor the code or use RCU.
5. For circular deadlocks, ensure consistent lock ordering.

In summary, understanding and using lockdep effectively can significantly improve the stability and performance of kernel code.

Line 28557: 08-06 12:11:18.912750 <3>[12351.001653][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28558: 08-06 12:11:18.912768 <3>[12351.001671][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28559: 08-06 12:11:18.913251 <3>[12351.002154][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28560: 08-06 12:11:18.913276 <3>[12351.002179][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5) Line 28622: 08-06 12:11:18.972531 <3>[12351.061434][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28623: 08-06 12:11:18.972549 <3>[12351.061452][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28624: 08-06 12:11:18.972790 <3>[12351.061693][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28625: 08-06 12:11:18.972801 <3>[12351.061704][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5) Line 28671: 08-06 12:11:19.032733 <3>[12351.121636][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28672: 08-06 12:11:19.032744 <3>[12351.121647][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28673: 08-06 12:11:19.033599 <3>[12351.122502][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28674: 08-06 12:11:19.033616 <3>[12351.122519][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5) Line 28731: 08-06 12:11:19.092692 <3>[12351.181595][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28732: 08-06 12:11:19.092736 <3>[12351.181639][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28733: 08-06 12:11:19.092927 <3>[12351.181830][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28734: 08-06 12:11:19.092938 <3>[12351.181841][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5) Line 28778: 08-06 12:11:19.152886 <3>[12351.241789][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28779: 08-06 12:11:19.152913 <3>[12351.241816][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28780: 08-06 12:11:19.153074 <3>[12351.241977][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28781: 08-06 12:11:19.153082 <3>[12351.241985][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5) Line 28828: 08-06 12:11:19.212961 <3>[12351.301864][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28829: 08-06 12:11:19.212974 <3>[12351.301877][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28830: 08-06 12:11:19.214642 <3>[12351.303545][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28831: 08-06 12:11:19.214659 <3>[12351.303562][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5) Line 28876: 08-06 12:11:19.273076 <3>[12351.361979][ T1422] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28877: 08-06 12:11:19.273095 <3>[12351.361998][ T1422] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(3) Line 28878: 08-06 12:11:19.273505 <3>[12351.362408][T30611] [ccu][ccu_ipc_send] CCU IPC synchronization violation, rcnt:0, wcnt:1 Line 28879: 08-06 12:11:19.273523 <3>[12351.362426][T30611] [ccu][ccuControl] sendCcuCommnadIpc failed, msgId(5)
08-12
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值