第5章：并发与竞态条件-6：Completions-优快云博客

In continuation of the previous text 第5章：并发与竞态条件-5：Reader/Writer Semaphores, let's GO ahead.

Completions

A common pattern in kernel programming involves initiating some activity outside of the current thread, then waiting for that activity to complete. This activity can be the creation of a new kernel thread or user-space process, a request to an existing process, or some sort of hardware-based action. It such cases, it can be tempting to use a semaphore for synchronization of the two tasks, with code such as:

内核编程中一种常见模式是：启动当前线程之外的某个操作，然后等待该操作完成。这种操作可能是创建新的内核线程或用户空间进程、向现有进程发送请求，或是某种基于硬件的动作。这种情况下，人们可能会想用信号量来同步两个任务，代码如下：

struct semaphore sem;
init_MUTEX_LOCKED(&sem);
start_external_task(&sem);
down(&sem);

The external task can then call up(&sem) when its work is done.

之后，外部任务完成工作时调用 up(&sem) 即可。

As is turns out, semaphores are not the best tool to use in this situation. In normal use, code attempting to locka semaphore finds that semaphore available almost all the time; if there is significant contention for the semaphore, performance suffers and the locking scheme needs to be reviewed. So semaphores have been heavily optimized for the “available” case. When used to communicate taskcompletion in the way shown above, however, the thread calling down will almost always have to wait; performance will suffer accordingly. Semaphores can also be subject to a (difficult) race condition when used in this way if they are declared as automatic variables. In some cases, the semaphore could vanish before the process calling up is finished with it.

但实际上，信号量并非这种场景的最佳工具。正常使用中，尝试获取信号量的代码几乎总能立即拿到（信号量大部分时间处于可用状态）；若信号量竞争激烈，性能会下降，此时需要重新审视锁机制。因此，信号量针对 “可用” 场景做了大量优化。但上述用于 “任务完成通知” 的场景中，调用 down 的线程几乎必然要等待，性能会随之受影响。此外，若信号量声明为自动变量（栈上分配），还可能出现难以排查的竞争条件 —— 调用 up 的进程尚未完成操作，信号量就已被销毁。

These concerns inspired the addition of the “completion” interface in the 2.4.7 kernel. Completions are a lightweight mechanism with one task: allowing one thread to tell another that the job is done. To use completions, your code must include <linux/completion.h>. A completion can be created with:

基于这些问题，2.4.7 内核引入了 “完成量” 接口。完成量是一种轻量级机制，核心功能只有一个：允许一个线程告知另一个线程 “任务已完成”。使用完成量的代码必须包含 <linux/completion.h> 头文件。完成量的创建和初始化有两种方式：

1. 静态创建

DECLARE_COMPLETION(my_completion);

Or, if the completion must be created and initialized dynamically:

2. 动态创建与初始化

struct completion my_completion;
/* ... */
init_completion(&my_completion);

Waiting for the completion is a simple matter of calling:

等待操作完成只需调用：

void wait_for_completion(struct completion *c);

Note that this function performs an uninterruptible wait. If your code calls wait_for_ completion and nobody ever completes the task, the result will be an unkillable process.

On the other side, the actual completion event may be signalled by calling one of the following:

注意：该函数会执行不可中断等待。若调用后始终没有线程触发完成事件，当前进程会变成不可杀死的状态（D 状态）。触发完成事件需调用以下两个函数之一：

oid complete(struct completion *c);
void complete_all(struct completion *c);

The two functions behave differently if more than one thread is waiting for the same completion event. complete wakes up only one of the waiting threads while complete_all allows all of them to proceed. In most cases, there is only one waiter, and the two functions will produce an identical result.

complete：仅唤醒一个等待该完成量的线程；
complete_all：唤醒所有等待该完成量的线程。

大多数场景下只有一个等待线程，此时两个函数效果相同。

A completion is normally a one-shot device; it is used once then discarded. It is possible, however, to reuse completion structures if proper care is taken. If complete_all is not used, a completion structure can be reused without any problems as long as there is no ambiguity about what event is being signalled. If you use complete_all, however, you must reinitialize the completion structure before reusing it. The macro:

完成量默认是 “一次性” 的 —— 使用一次后即失效。但只要处理得当，也可复用：

若未使用 complete_all，且能明确区分事件信号，完成量可直接复用；
若使用了 complete_all，复用前必须重新初始化，可通过以下宏快速实现：

INIT_COMPLETION(struct completion c);

can be used to quickly perform this reinitialization.

As an example of how completions may be used, consider the complete module, which is included in the example source. This module defines a device with simple semantics: any process trying to read from the device will wait (using wait_for_completion until some other process writes to the device. The code which implements this behavior is:

以下是 complete 模块的示例代码（包含在示例源码中），该模块实现了一个简单设备：任何尝试读取该设备的进程都会等待（通过 wait_for_completion），直到有其他进程向设备写入数据。代码如下：

DECLARE_COMPLETION(comp); // 静态声明完成量

ssize_t complete_read(struct file *filp, char __user *buf, size_t count, loff_t *pos)
{
    printk(KERN_DEBUG "进程 %i (%s) 即将进入睡眠\n", current->pid, current->comm);
    wait_for_completion(&comp); // 等待完成事件
    printk(KERN_DEBUG "进程 %i (%s) 被唤醒\n", current->pid, current->comm);
    return 0; /* 返回 EOF */
}

ssize_t complete_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos)
{
    printk(KERN_DEBUG "进程 %i (%s) 正在唤醒读进程...\n", current->pid, current->comm);
    complete(&comp); // 触发完成事件，唤醒一个读进程
    return count; /* 返回成功写入的字节数，避免重试 */
}

It is possible to have multiple processes “reading” from this device at the same time. Each write to the device will cause exactly one read operation to complete, but there is no way to know which one it will be.

多个进程可同时 “读取” 该设备，每次向设备写入数据都会唤醒一个读进程，但无法确定具体唤醒哪一个。

A typical use of the completion mechanism is with kernel thread termination at module exit time. In the prototypical case, some of the driver internal workings is performed by a kernel thread in a while (1) loop. When the module is ready to be cleaned up, the exit function tells the thread to exit and then waits for completion. To this aim, the kernel includes a specific function to be used by the thread:

完成量最典型的应用是模块退出时等待内核线程终止：

驱动通常会有一个运行在 while (1) 循环中的内核线程；
模块退出时，退出函数会通知线程退出，然后通过完成量等待线程终止。

内核提供了一个专门用于线程的函数如下，线程调用该函数时，会先触发完成事件（通知等待线程），再退出并返回 retval。

void complete_and_exit(struct completion *c, long retval);

补充说明:

完成量与信号量的核心差异
- 完成量：专为 “等待任务完成” 设计，轻量级、无竞争优化偏向，避免信号量的栈变量销毁风险；
- 信号量：通用互斥 / 同步工具，优化了 “快速获取” 场景，不适合纯通知类场景。
不可中断等待的替代方案

若需可中断等待，可使用 wait_for_completion_interruptible（返回 -ERESTARTSYS 表示被中断），或 wait_for_completion_killable（可被致命信号中断）。
多等待线程的顺序问题

complete 唤醒的线程顺序由内核调度策略决定（通常是先进先出，但不保证），若需严格顺序，需结合其他同步机制（如队列）。
完成量的性能优势

完成量的实现比信号量简单，无信号量的引用计数、等待队列复杂逻辑，在 “单通知 - 单等待” 场景下，性能优于信号量。