小心pthread_cond_signal和SetEvent之间的差异
转载时请注明出处和作者联系方式
文章出处:http://www.limodev.cn/blog
作者联系方式:李先静 <xianjimli@gmail.com>
今天帮同事查一个多线程的BUG,其中一个线程挂在g_cond_wait上不动了。从代码来看,看出不出任何问题,g_cond_wait和g_cond_signal是严格配对的。折腾了两个小时后,从LOG信息中发现,g_cond_wait和g_cond_signal的顺序有点问题,一个线程先调g_cond_signal,另外一个线程才调g_cond_wait。
g_cond_signal是glib的封装,在Linux下,是用pthread_cond_signal模拟的,在Win32下,是用SetEvent模拟的。在Win32下,SetEvent和WaitForSingleObject在两个线程中的调用顺序没有关系,奇怪,难道在linux下两者的调用顺序有影响吗?
看了pthread的代码,果然如此:pthread_cond_signal发现没有其它线程等待,它直接返回了(见用红色高亮的代码)。
int pthread_cond_signal(pthread_cond_t *cond)
{
if (cond == NULL)
return pth_error(EINVAL, EINVAL);
if (*cond == PTHREAD_COND_INITIALIZER)
if (pthread_cond_init(cond, NULL) != OK)
return errno;
if (!pth_cond_notify((pth_cond_t *)(*cond), FALSE))
return errno;
return OK;
}
int pth_cond_notify(pth_cond_t *cond, int broadcast)
{
/* consistency checks */
if (cond == NULL)
return pth_error(FALSE, EINVAL);
if (!(cond->cn_state & PTH_COND_INITIALIZED))
return pth_error(FALSE, EDEADLK);
/* do something only if there is at least one waiters (POSIX semantics) */
if (cond->cn_waiters > 0) {
/* signal the condition */
cond->cn_state |= PTH_COND_SIGNALED;
if (broadcast)
cond->cn_state |= PTH_COND_BROADCAST;
else
cond->cn_state &= ~(PTH_COND_BROADCAST);
cond->cn_state &= ~(PTH_COND_HANDLED);
/* and give other threads a chance to awake */
pth_yield(NULL);
}
/* return to caller */
return TRUE;
}
晚上回家后,我又看了reactos关于SetEvent的实现。结果也意料之中:没有线程等待这个Event时,它仍然会设置SignalState(见用红色高亮的代码)。
LONG STDCALL KeSetEvent(PKEVENT Event,
KPRIORITY Increment,
BOOLEAN Wait)
{
KIRQL OldIrql;
LONG PreviousState;
PKWAIT_BLOCK WaitBlock;
DPRINT("KeSetEvent(Event %x, Wait %x)/n",Event,Wait);
/* Lock the Dispathcer Database */
OldIrql = KeAcquireDispatcherDatabaseLock();
/* Save the Previous State */
PreviousState = Event->Header.SignalState;
/* Check if we have stuff in the Wait Queue */
if (IsListEmpty(&Event->Header.WaitListHead)) {
/* Set the Event to Signaled */
DPRINT("Empty Wait Queue, Signal the Event/n");
Event->Header.SignalState = 1;
} else {
/* Get the Wait Block */
WaitBlock = CONTAINING_RECORD(Event->Header.WaitListHead.Flink,
KWAIT_BLOCK,
WaitListEntry);
/* Check the type of event */
if (Event->Header.Type == NotificationEvent || WaitBlock->WaitType == WaitAll) {
if (PreviousState == 0) {
/* We must do a full wait satisfaction */
DPRINT("Notification Event or WaitAll, Wait on the Event and Signal/n");
Event->Header.SignalState = 1;
KiWaitTest(&Event->Header, Increment);
}
} else {
/* We can satisfy wait simply by waking the thread, since our signal state is 0 now */
DPRINT("WaitAny or Sync Event, just unwait the thread/n");
KiAbortWaitThread(WaitBlock->Thread, WaitBlock->WaitKey, Increment);
}
}
/* Check what wait state was requested */
if (Wait == FALSE) {
/* Wait not requested, release Dispatcher Database and return */
KeReleaseDispatcherDatabaseLock(OldIrql);
} else {
/* Return Locked and with a Wait */
KTHREAD *Thread = KeGetCurrentThread();
Thread->WaitNext = TRUE;
Thread->WaitIrql = OldIrql;
}
/* Return the previous State */
DPRINT("Done: %d/n", PreviousState);
return PreviousState;
}
而在KeWaitForSingleObject中,它发现SignalState大于0,就会Wait成功(见用红色高亮的代码)。
NTSTATUS STDCALL KeWaitForSingleObject(PVOID Object,
KWAIT_REASON WaitReason,
KPROCESSOR_MODE WaitMode,
BOOLEAN Alertable,
PLARGE_INTEGER Timeout)
{
...
if (CurrentObject->Header.SignalState > 0)
{
/* Another satisfied object */
KiSatisfyNonMutantWait(CurrentObject, CurrentThread);
WaitStatus = STATUS_WAIT_0;
goto DontWait;
}
...
}
由此可见,glib封装的g_cond_signal/g_cond_wait在Win32下和Linux下行为并不完全一致。即使不使用glib的封装,自己封装或者直接使用时,也要小心这个微妙的陷阱。
本文探讨了pthread_cond_signal与SetEvent在不同平台上的行为差异,特别是在Linux与Win32环境下g_cond_wait与g_cond_signal调用顺序的影响。

被折叠的 条评论
为什么被折叠?



