使用条件变量时的线程虚假唤醒_while 循环mutex 虚假-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_40843865/article/details/96099921

本文详细解释了多线程编程中“虚假唤醒”的概念及其在Linux和Java中的实现细节，阐述了为何在条件变量等待中需要使用while循环而不是if语句。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在学习muduo网络库时，看见以下代码段，然后就想为什么要用while循环，做出了以下猜想：

T take()
  {
    MutexLockGuard lock(mutex_);
    
    //always use a while-loop, due to spurious wakeup
    while (queue_.empty()) {
      notEmpty_.wait();
    }    
    
    assert(!queue_.empty());
    T front(queue_.front());
    queue_.pop_front();
    return front;
  }

后来仔细想想后又觉得不对，正解见下文

关于虚假唤醒：

一个典型的条件变量使用样例如：

// wait端
pthread_mutex_lock(mtx);
while(deque.empty())
    pthread_cond_wait(...);
deque.pop_front();
pthread_mutex_unlock(mtx);
 
// signal端
pthread_mutex_lock(mtx);
deque.push_back(x);
pthread_cond_signal(...);
pthread_mutex_unlock(mtx);

在wait端必须使用while来等待条件变量而不能使用if语句，原因在于spurious wakeups，即虚假唤醒。

虚假唤醒很容易被人误解为：如果有多个消费者，这些消费者可能阻塞在同一位置。当生产者通知not empty时，duque立即被第一个被唤醒的消费者清空，则后面的消费者相当于是被虚假唤醒了。
这种情况完全可以通过使用signal而非broadcast解决。signal只会唤醒某个线程，唤醒的依据为等待线程的优先级，若优先级相同，则依据线程的等待时长。

上述现象类似于惊群现象：
惊群效应就是当一个fd的事件被触发时，所有等待这个fd的线程或进程都被唤醒。一般都是socket的accept()会导致惊群（当然也可以弄成一堆线程/进程阻塞read一个fd，但这样写应该没什么意义吧），很多个进程都block在server socket的accept()，一但有客户端进来，所有进程的accept()都会返回，但是只有一个进程会读到数据，就是惊群。实际上现在的Linux内核实现中不会出现惊群了，只会有一个进程被唤醒（Linux2.6内核）。

虚假唤醒的正解是：

wikipedia中有关于spurious wakeups的大致描述：

According to David R. Butenhof's Programming with POSIX Threads ISBN 0-201-63392-2:

"This means that when you wait on a condition variable,the wait may (occasionally) return when no thread specifically broadcast or signaled that condition variable.Spurious wakeups may sound strange, but on some multiprocessor systems, making condition wakeup completely predictable might substantially slow all condition variable operations. The race conditions that cause spurious wakeups should be considered rare."

其中提到，即使没有线程broadcast 或者signal条件变量，wait也可能偶尔返回。

Linux中帮助中提到的：

在多核处理器下，pthread_cond_signal可能会激活多于一个线程（阻塞在条件变量上的线程）。
On a multi-processor, it may be impossible for an implementation of pthread_cond_signal() to avoid the unblocking of more than one thread blocked on a condition variable.
结果是，当一个线程调用pthread_cond_signal()后，多个调用pthread_cond_wait()或pthread_cond_timedwait()的线程返回。这种效应成为”虚假唤醒”(spurious wakeup) [4]。
The effect is that more than one thread can return from its call to pthread_cond_wait() or pthread_cond_timedwait() as a result of one call to pthread_cond_signal(). This effect is called "spurious wakeup". Note that the situation is self-correcting in that the number of threads that are so awakened is finite; for example, the next thread to call pthread_cond_wait() after the sequence of events above blocks.
虽然虚假唤醒在pthread_cond_wait函数中可以解决，为了发生概率很低的情况而降低边缘条件（fringe condition）效率是不值得的，纠正这个问题会降低对所有基于它的所有更高级的同步操作的并发度。所以pthread_cond_wait的实现上没有去解决它。
While this problem could be resolved, the loss of efficiency for a fringe condition that occurs only rarely is unacceptable, especially given that one has to check the predicate associated with a condition variable anyway. Correcting this problem would unnecessarily reduce the degree of concurrency in this basic building block for all higher-level synchronization operations.

所以通常的标准解决办法是这样的：将条件的判断从if 改为while

java中的虚假唤醒：
虚假唤醒（spurious wakeup）是一个表象，即在多处理器的系统下发出wait的程序有可能在没有notify唤醒的情形下苏醒继续执行。以运行在linux的hotspot虚拟机上的java程序为例，wait方法在jvm执行时实质是调用了底层pthread_cond_wait/pthread_cond_timedwait函数，挂起等待条件变量来达到线程间同步通信的效果，而底层wait函数在设计之初为了不减慢条件变量操作的效率并没有去保证每次唤醒都是由notify触发，而是把这个任务交由上层应用去实现，即使用者需要定义一个循环去判断是否条件真能满足程序继续运行的需求，当然这样的实现也可以避免因为设计缺陷导致程序异常唤醒的问题。

参考文章如下：
https://blog.youkuaiyun.com/Tornado1102/article/details/76158390