pthread_cond_destroy死锁卡住问题处理记录

本文探讨了在销毁条件变量过程中遇到的问题,特别是当其他线程正在等待时的不确定性行为。通过源码分析揭示了问题根源在于条件变量在未初始化状态下就被使用的错误做法,并给出了正确的解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题

供应商代码, 在退出某线程时, 销毁条件变量的过程中, 线程被阻塞.

在这里插入图片描述

参考手册

参看man手册, 销毁其它线程正在等待的cond将导致不确定行为:

pthread_cond_destroy()
It  shall be safe to destroy an initialized condition variable upon which no threads are currently blocked. Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.

因此在销毁之前, 先发送pthread_cond_broadcast(&pEvent->cond);
通知所有等待线程:

int32_t osEventDestroy(osEvent *pEvent)
{
    pthread_mutex_lock(&pEvent->mutex);
    pthread_cond_broadcast(&pEvent->cond);
    pthread_mutex_unlock(&pEvent->mutex);

    pthread_cond_destroy(&pEvent->cond);
    pthread_mutex_destroy(&pEvent->mutex); 

再次测试, 问题还是有概率出现.

查看源码

查看源码描述, __pthread_cond_destroy 默认有其它线程在等待, 因此将会等待__wrefs变量的值:


/* See __pthread_cond_wait for a high-level description of the algorithm.

   A correct program must make sure that no waiters are blocked on the condvar
   when it is destroyed, and that there are no concurrent signals or
   broadcasts.  To wake waiters reliably, the program must signal or
   broadcast while holding the mutex or after having held the mutex.  It must
   also ensure that no signal or broadcast are still pending to unblock
   waiters; IOW, because waiters can wake up spuriously, the program must
   effectively ensure that destruction happens after the execution of those
   signal or broadcast calls.
   Thus, we can assume that all waiters that are still accessing the condvar
   have been woken.  We wait until they have confirmed to have woken up by
   decrementing __wrefs.  */
int
__pthread_cond_destroy (pthread_cond_t *cond)
{
  LIBC_PROBE (cond_destroy, 1, cond);

  /* Set the wake request flag.  We could also spin, but destruction that is
     concurrent with still-active waiters is probably neither common nor
     performance critical.  Acquire MO to synchronize with waiters confirming
     that they finished.  */
  unsigned int wrefs = atomic_fetch_or_acquire (&cond->__data.__wrefs, 4);
  int private = __condvar_get_private (wrefs);
  while (wrefs >> 3 != 0)
    {
      futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
      /* See above.  */
      wrefs = atomic_load_acquire (&cond->__data.__wrefs);
    }
  /* The memory the condvar occupies can now be reused.  */
  return 0;
}

打印销毁之前__wrefs的为-8, 不可理喻. 尝试将其强制清零之后, 问题消失.

    //
int32_t osEventDestroy(osEvent *pEvent)
{
    pthread_mutex_lock(&pEvent->mutex);
    pthread_cond_broadcast(&pEvent->cond);
    pthread_mutex_unlock(&pEvent->mutex);
	if(0 != pEvent->cond.__data.__wrefs)
	{
		OSLAYER_ERR("%p %s cond error with refs %d\n",pEvent,__func__,pEvent->cond.__data.__wrefs);
		pEvent->cond.__data.__wrefs = 0;
	}
    pthread_cond_destroy(&pEvent->cond);
    pthread_mutex_destroy(&pEvent->mutex); 

追查原因

查阅代码, 没有其它更多的线程在使用该变量, 那么为啥该值会异常呢?
最后发现, 是因为源码在使用条件变量时, 先启动了等待线程pthread_cond_wait, 再进行了cond的初始化.
也就是说,pthread_cond_wait带入了条件变量的时候, 该条件变量并没有初始化, 执行完成了pthread_cond_wait之后, 才调用了pthread_cond_init初始化变量.

调整代码逻辑之后, 问题消失.

结论:

使用未初始化的条件变量, 函数不会报错,但可能产生执行异常.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值