第5章：并发与竞态条件-1：Concurrency and Its Management

原创已于 2025-11-05 19:39:01 修改 · 1.5k 阅读

42 ·

CC 4.0 BY-SA版权

文章标签：

#linux #驱动开发

于 2025-10-21 06:30:00 首次发布

LDD Chapter-5 专栏收录该内容

13 篇文章

订阅专栏

In continuation of the previous text 第5章：并发与竞态条件-0：并发与竞态条件, let's GO ahead.

Concurrency and Its Management

In a modern Linux system, there are numerous sources of concurrency and, therefore, possible race conditions. Multiple user-space processes are running, and theycan access your code in surprising combinations of ways. SMP systems can be executing your code simultaneously on different processors. Kernel code is preemptible; your driver’s code can lose the processor at any time, and the process that replaces it could also be running in your driver. Device interrupts are asynchronous events that can cause concurrent execution of your code. The kernel also provides various mechanisms for delayed code execution, such as workqueues, tasklets, and timers, which can cause your code to run at any time in ways unrelated to what the current process is doing. In the modern, hot-pluggable world, your device could simply disappear while you are in the middle of working with it.

在现代 Linux 系统中，存在众多并发源，因此也存在潜在的竞态条件。多个用户空间进程在运行，它们可能以令人意想不到的组合方式访问你的代码；SMP 系统可能在不同处理器上同时执行你的代码；内核代码是可抢占的 —— 驱动代码可能随时失去处理器使用权，而取而代之的进程也可能在你的驱动中运行；设备中断是异步事件，可能导致代码的并发执行；内核还提供了各种类延迟代码执行机制（如工作队列、tasklet、定时器），这些机制可能导致你的代码在任何时候运行，且与当前进程的操作无关。在支持热插拔的现代环境中，你正在操作的设备甚至可能突然消失。

Avoidance of race conditions can be an intimidating task. In a world where anything can happen at any time, how does a driver programmer avoid the creation of absolute chaos? As it turns out, most race conditions can be avoided through some thought, the kernel’s concurrency control primitives, and the application of a few basic principles. We’ll start with the principles first, then get into the specifics of how to apply them.

避免竞态条件可能是一项令人望而生畏的任务。在一个任何事情都可能随时发生的环境中，驱动程序员如何避免陷入彻底的混乱？事实证明，通过思考、使用内核的并发控制原语以及应用一些基本原则，大多数竞态条件是可以避免的。我们先从原则入手，再深入探讨如何应用它们。

Race conditions come about as a result of shared access to resources. When two threads of execution* have a reason to work with the same data structures (or hardware resources), the potential for mixups always exists. So the first rule of thumb to keep in mind as you design your driver is to avoid shared resources whenever possible. If there is no concurrent access, there can be no race conditions. So carefully written kernel code should have a minimum of sharing. The most obvious application of this idea is to avoid the use of global variables. If you put a resource in a placewhere more than one thread of execution can find it, there should be a strong reason
for doing so.

竞态条件源于对共享资源的访问。当两个执行线程 * 需要操作相同的数据结构（或硬件资源）时，就始终存在操作混乱的可能性。因此，设计驱动时要牢记的第一条经验法则是：尽可能避免共享资源。如果没有并发访问，就不会有竞态条件。因此，精心编写的内核代码应将共享降至最低。这一思想最明显的应用是避免使用全局变量：如果将资源放在多个执行线程都能访问的地方，必须有充分的理由。

The fact of the matter is, however, that such sharing is often required. Hardwareresources are, by their nature, shared, and software resources also must often be available to more than one thread. Bear in mind as well that global variables are far from the only way to share data; any time your code passes a pointer to some other part of the kernel, it is potentially creating a new sharing situation. Sharing is a fact of life.

然而，实际情况是，这种共享往往是必需的。硬件资源本质上是共享的，软件资源也常常必须供多个线程使用。还要记住，全局变量远非共享数据的唯一方式 —— 每当代码将指针传递给内核的其他部分时，都可能产生新的共享场景。共享是无法避免的现实。

Here is the hard rule of resource sharing: any time that a hardware or software resource is shared beyond a single thread of execution, and the possibility exists that one thread could encounter an inconsistent view of that resource, you must explicitly manage access to that resource. In the scull example above, process B’s view of the situation is inconsistent; unaware that process A has already allocated memory for the (shared) device, it performs its own allocation and overwrites A’s work. In this case, we must control access to the scull data structure. We need to arrange things so that the code either sees memory that has been allocated or knows that no memory has been or will be allocated by anybody else. The usual technique for access management is called locking or mutual exclusion—making sure that only one thread of execution can manipulate a shared resource at any time. Much of the rest of this chapter will be devoted to locking.

资源共享的严格规则是：当硬件或软件资源被共享给多个执行线程，且存在某个线程可能看到资源不一致状态的风险时，必须显式管理对该资源的访问。在上述 scull 示例中，进程 B 看到的状态是不一致的 —— 它不知道进程 A 已经为（共享的）设备分配了内存，于是自己执行分配并覆盖了 A 的操作。在这种情况下，我们必须控制对 scull 数据结构的访问，确保代码要么看到已分配的内存，要么明确知道没有其他任何线程已经或将要分配内存。管理访问的常用技术称为锁定（locking） 或互斥（mutual exclusion）—— 确保同一时间只有一个执行线程能操作共享资源。本章的后续内容将主要围绕锁定展开。

First, however, we must briefly consider one other important rule. When kernel code creates an object that will be shared with any other part of the kernel, that object must continue to exist (and function properly) until it is known that no outside references to it exist. The instant that scull makes its devices available, it must be prepared to handle requests on those devices. And scull must continue to be able to handle requests on its devices until it knows that no reference (such as open userspace files) to those devices exists. Two requirements come out of this rule: no object can be made available to the kernel until it is in a state where it can function prop-
erly, and references to such objects must be tracked. In most cases, you’ll find that the kernel handles reference counting for you, but there are always exceptions.

不过，我们首先必须简要考虑另一条重要规则：当内核代码创建一个将与内核其他部分共享的对象时，该对象必须持续存在（并正常工作），直到确定没有外部引用指向它为止。一旦 scull 使其设备可用，就必须准备好处理对这些设备的请求；并且，在知道没有对这些设备的引用（如已打开的用户空间文件）之前，scull 必须持续能够处理请求。这条规则衍生出两个要求：对象在能够正常工作之前，不能向内核开放访问；必须跟踪对这类对象的引用。在大多数情况下，内核会为你处理引用计数，但总会有例外。

Following the above rules requires planning and careful attention to detail. It is easy to be surprised by concurrent access to resources you hadn’t realized were shared. With some effort, however, most race conditions can be headed off before they bite you—or your users.

遵循上述规则需要规划和对细节的关注。你可能会惊讶地发现，一些未曾意识到是共享的资源也存在并发访问。但通过努力，大多数竞态条件都可以在影响你或用户之前被阻止。

补充说明:

并发源的具体影响

不同并发源对驱动的影响不同，需针对性处理：
1. 内核抢占：进程上下文代码可能被其他进程抢占，此时需确保锁在抢占期间仍能正确保护资源。
2. 中断：中断处理程序可能打断驱动的进程上下文代码，需使用中断安全的同步机制（如自旋锁）；
3. SMP 多 CPU：同一驱动代码可能在不同 CPU 上同时执行，需确保所有共享资源的操作都被同步；
4. 多进程 / 多线程：通过用户态系统调用访问驱动，需用锁保护共享数据（如 scull_dev 结构）；
“最小共享” 原则的实践

减少共享的具体方法包括：
1. 使用局部变量（仅当前执行路径可见，无共享风险）。
2. 对只读数据无需同步（多个线程同时读取不会导致不一致）；
3. 将数据放入 file->private_data 而非全局变量（每个打开的文件拥有独立副本）；

引用计数的典型场景

驱动需跟踪设备被引用的次数（如打开计数），确保在最后一次引用消失后才释放资源：

// 设备结构中添加引用计数
struct scull_dev {
    ...
    atomic_t refcount;  // 原子计数器，记录引用次数
};

// 打开设备时增加计数
atomic_inc(&dev->refcount);

// 关闭设备时减少计数，为 0 时释放资源
if (atomic_dec_and_test(&dev->refcount)) {
    kfree(dev);  // 最后一次引用消失，释放设备
}