C++并发编程之：条件变量（信号丢失、虚假唤醒）

最新推荐文章于 2025-06-08 10:26:24 发布

natsusao

最新推荐文章于 2025-06-08 10:26:24 发布

阅读量1.6k

点赞数 31

文章标签： c++ linux

本文链接：https://blog.youkuaiyun.com/weixin_42445065/article/details/136657401

版权

条件变量

条件变量是C++并发编程中常用的技巧，主要作用是阻塞线程以等待合适的时机唤醒。在单一唤醒的用法下，它的生态位可以被信号量（semaphore）替代。

它存在的动机

简单的例子

先来看一个最简单的需求例子，假设在个经典的生产消费者模型里，我们要写消费者的需求代码。
那么消费者代码最粗犷的样子 be like：

while(true)
{
	std::lock_guard<std::mutex> lock(m_mutex);
	if(tasks.isEmpty()) continue;
	
	Task task = tasks.front(); //假设是 front + pop
	lock.unlock();
	task.call();
}

一个直接的死循环，不断的 quest 任务队列，拿出来任务并执行。
如果你的生产者是一个满负荷的状态，这种消费者写法倒也还好。但如果生产者常年是闲暇的，这段代码的空闲 cpu 占用率想必很感人。

怎么改进呢？有一种做法是加入规律休眠。

while(true)
{
	std::lock_guard<std::mutex> lock(m_mutex);
	if(tasks.isEmpty()) 
	{
		lock.unlock();
		std::this_thread::sleep_for(std::chrono::microseconds(10));
		continue;
	}
	
	Task task = tasks.front(); //假设是 front + pop
	lock.unlock();
	task.call();
}

每次拿不到任务的时候，就休眠一段时间再继续。
这种做法其实算不太了大毛病，笔者公司有些底层也就是这么写的，简单、可靠、快速。要说它的代价吧，在计算机算力资源比较充裕的年代，倒也还好。

但怎么定义睡眠时间，就是一个要考虑的问题了。定太大，处理消息的延迟就上去了。定太小，空闲时的 cpu 占用率还是会难看。

有人觉得这不够优雅呀，有没有更好的办法呢？这就是条件变量了。

条件变量的引出和使用

优化轮询的经典做法是什么？当然就是事件机制了。
条件变量其实也可以理解成一种事件，最主要的有三个 api 如下：

std::condition_variable::wait();
std::condition_variable::notify_one();
std::condition_variable::notify_all();

一看函数名就知道在干啥了。

wait 接口调用后，会暂停当前线程，直到有信号发出往下执行。
notify_one 发出一个事件，唤醒某个 wait 线程。
notify_all 发出让所有 wait 线程唤醒的信号。

既然有这个配合，我们把消费者线程的主动休眠改成被动等待。下面代码是简化版本的生产消费模型：

#include <thread>
#include <chrono>
#include <condition_variable>

std::condition_variable cv;
std::mutex mutex;

bool hasProduct = false;

int consumerThread()
{
	while (true)
	{
		std::unique_lock<std::mutex> lock(mutex);
		printf("fetched locker, ready wait.\n");
		cv.wait(lock);

		if (!hasProduct) throw "wrong product consume!";
		hasProduct = false;

		printf("wait finished.\n");
	}

	return 0;
}

int producerThread()
{
	std::unique_lock<std::mutex> lock(mutex);

	printf("produce .\n");
	hasProduct = true;
	cv.notify_one();
	
	// ...some other jobs
	
	return 0;
}

int main()
{
	std::thread t1(producerThread);
	std::thread t2(consumerThread);

	t1.join();
	t2.join();

	return 0;
}

信号丢失问题

看上去似乎解决了 cpu 占用问题了。但仔细想想里面有啥问题么？
有，producerThread 跟 consumerThread 是并行的，如果 producerThread 在 consumerThread 抵达 wait 之前就调用了 notify_one，那么这次的事件，consumerThread 就永远丢失了。丢失的结果也许是消费线程的死等。
所以一般来说，正确的 wait 代码，在wait之前为了保证正确性不漏事件，必须得检查一次条件是否成立。同时，这也是为什么条件变量要搭配锁来运行。

#include <thread>
#include <chrono>
#include <condition_variable>

std::condition_variable cv;
std::mutex mutex;

bool hasProduct = false;

int consumerThread()
{
	while (true)
	{
		std::unique_lock<std::mutex> lock(mutex);
		printf("fetched locker, ready wait.\n");
		//注意此处
		if(!hasProduct)
			cv.wait(lock);

		if (!hasProduct) throw "wrong product consume!";
		hasProduct = false;

		printf("wait finished.\n");
	}

	return 0;
}

int producerThread()
{
	std::unique_lock<std::mutex> lock(mutex);

	printf("produce .\n");
	hasProduct = true;
	cv.notify_one();
	
	// ...some other jobs
	
	return 0;
}

int main()
{
	std::thread t1(producerThread);
	std::thread t2(consumerThread);

	t1.join();
	t2.join();

	return 0;
}

虚假唤醒问题

上面的代码看上去还有问题么？
有，还有一个坑，“虚假唤醒（spurious wakeup）”。
虚假唤醒的含义比较繁复，有兴趣可以自行搜索。总的来说是一种不可抗力的系统问题，亦即就算你调用 notify_one 的本意是唤醒一个线程继续，但有时候会有不止一条在 wait 的线程被唤醒。
这个现象知道了，但有些同学还是不能理解这种现象可能导致的不正确结果——因为 wait 的参数是需要一把锁来配合的，看上去非常安全。
为了说明这个问题，我们先要知道：

wait(lock);

调用其实是三个步骤（代码为示意）：

release(lock);
wait();
fetch(lock);

wait 在阻塞时，会先释放掉锁，这是自然的，因为此时休眠什么都不会做，拿锁是没有意义的。等到唤醒时，会尝试把锁要回来，然后再继续往下执行。

接下来，我们的main函数修改如下，主要是增加了 t3 作为消费线程：

int main()
{
	std::thread t1(producerThread);
	std::thread t2(consumerThread);
	std::thread t3(consumerThread);

	t1.join();
	t2.join();
	t3.join();

	return 0;
}

这个时候，就有可能有这种情况：

t2、t3 在没有生产的情况下，同时 wait 等待；
t1 生产；
t2、t3 同时被虚假唤醒；
t2 率先抢到了锁，往下执行；
t3 暂时等锁；
t2 消耗了资源，释放锁；
t3 拿到了锁，往下执行，消耗不存在的资源，出错。

时间轴如下：

t1 生产者	t2 消费者	t3 消费者
生产资源	wait	wait
notify_one	唤醒	唤醒
释放锁	重新抢到锁	等锁
下一轮等待	消耗掉资源	等锁
下一轮等待	释放锁	重新抢到锁
下一轮等待	下一轮等待	消耗资源出错！

解决方法也很简单，就是 t3 在唤醒并拿到锁之后，重复判断一次条件是否成立。把 if 改成 while 就可以了。

#include <thread>
#include <chrono>
#include <condition_variable>

std::condition_variable cv;
std::mutex mutex;

bool hasProduct = false;

int consumerThread()
{
	while (true)
	{
		std::unique_lock<std::mutex> lock(mutex);
		printf("[%d] fetched locker, ready wait.\n", std::this_thread::get_id());
		
		// 注意这里的修改
		while(!hasProduct)
			cv.wait(lock);

		if (!hasProduct) throw "wrong product consume!";
		hasProduct = false;

		printf("[%d] wait finished.\n", std::this_thread::get_id());
	}

	return 0;
}

int producerThread()
{
	while (true)
	{
		std::this_thread::sleep_for(std::chrono::milliseconds(100));
		std::unique_lock<std::mutex> lock(mutex);

		printf("produce .\n");
		hasProduct = true;
		cv.notify_one();
	}

	return 0;
}

int main()
{
	std::thread t1(producerThread);
	std::thread t2(consumerThread);
	std::thread t3(consumerThread);
	std::thread t4(consumerThread);
	std::thread t5(consumerThread);

	t1.join();
	t2.join();
	t3.join();

	return 0;
}

更新潮的写法

C++11 支持了 lambda，于是 wait 的语义可以更明确了：

while(!hasProduct)
	cv.wait(lock);

修改为：

cv.wait(lock, [](){ return hasProduct; });

根据提供的 lambda ，wait 会自动循环判断条件是否满足跳出。

它跟信号量（semaphore）的二三事

共同点

很多时候我们写 condition variables ，只是用来做单一唤醒的，比如上面提到生产消费的例子。这种情况下，条件变量跟信号量的差别确实不大，基本等同于 semaphore<1>。可能区别在于条件变量更轻量。这种情况是很多人疑惑它们区别的来源。

不同点

有一些需求，是 semaphore 能做 condition variables 不能做的，也有一些则是反过来。

多线程同时唤醒的需求，采用条件变量。比如一个经典的场景，有指定多个数量的线程，必须集合等待之后再同时出发。也许有称之为 barrier 的用法。这种场景下 notify_all 显然是最合适的；
某种指定数量资源的竞争限制，采用信号量。condition variables 其实也能做，但不够直接。最直接的是 semaphore。

其实条件变量+原子操作可以实现信号量

感觉写到这里字太多了，直接给出一个实现，感受一下：

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

class Semaphore {
public:
    Semaphore(int count = 0) : count_(count) {}

    void acquire() {
        std::unique_lock<std::mutex> lock(mutex_);
        condition_.wait(lock, [this]() {return !!count_; });
        count_--;
    }

    void release() {
        std::unique_lock<std::mutex> lock(mutex_);
        count_++;
        condition_.notify_one();
    }

private:
    std::mutex mutex_;
    std::condition_variable condition_;
    int count_;
};

Semaphore semaphore(2);

// thread function
void accessResource(int thread_num) {
    std::cout << "Thread " << thread_num << " is waiting resource..." << std::endl;

    // acquire resource
    semaphore.acquire();

    std::cout << "Thread " << thread_num << " has accessed resource..." << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(2));  // some ops take times

    // release resource
    semaphore.release();
    std::cout << "Thread " << thread_num << " has finished accession, release resource." << std::endl;
}

int main() {
    // multi-threads
    std::thread threads[5];
    for (int i = 0; i < 5; i++) {
        threads[i] = std::thread(accessResource, i);
    }

    for (int i = 0; i < 5; i++) {
        threads[i].join();
    }

    return 0;
}