linux read 死锁,linux下记一次使用gdb死锁问题的定位以及pthread_cancel使用的建议

程序跑死卡住了,怀疑是死锁引起的

使用

gdb attach pid

命令附加到那个进程上,然后再gdb里输入命令

info thread

得到如下信息

(gdb) info thread

Id Target Id Frame

4 Thread 0x7f466b8e1700 (LWP 10945) "jtnvragentserve" 0x00007f466cbd5b13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81

3 Thread 0x7f466c0e2700 (LWP 11009) "jtnvragentserve" 0x00007f466cbccda3 in select () at ../sysdeps/unix/syscall-template.S:81

2 Thread 0x7f4653fff700 (LWP 9171) "jtnvragentserve" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

* 1 Thread 0x7f466e052780 (LWP 10942) "jtnvragentserve" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

很明显,线程9171和10942确实是死锁了,

再使用命令查看线程堆栈信息

thread apply all bt

看到如下信息

(gdb) thread apply all bt

Thread 4 (Thread 0x7f466b8e1700 (LWP 10945)):

#0 0x00007f466cbd5b13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81

#1 0x00007f466c38ed88 in ?? ()

#2 0x00007f465c000f80 in ?? ()

#3 0xffffffff64001c50 in ?? ()

#4 0x000000007fffffff in ?? ()

#5 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f466c0e2700 (LWP 11009)):

#0 0x00007f466cbccda3 in select () at ../sysdeps/unix/syscall-template.S:81

#1 0x0000000000455373 in _eXosip_read_message (excontext=0x144f770, max_message_nb=1, sec_max=1, usec_max=0) at udp.c:1580

#2 0x00000000004415a4 in eXosip_execute (excontext=0x144f770) at eXconf.c:791

#3 0x000000000044254e in _eXosip_thread (arg=0x144f770) at eXconf.c:1090

#4 0x00007f466d9b8182 in start_thread (arg=0x7f466c0e2700) at pthread_create.c:312

#5 0x00007f466cbd547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7f4653fff700 (LWP 9171)):

#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1 0x00007f466d9ba657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0

#2 0x00007f466d9ba480 in __GI___pthread_mutex_lock (mutex=0x1e34140) at ../nptl/pthread_mutex_lock.c:79

#3 0x00000000004d95be in jthread::JMutex::Lock() ()

#4 0x00000000004e9013 in jrtplib::RTPUDPv4Transmitter::WaitForIncomingData(jrtplib::RTPTime const&, bool*) ()

#5 0x000000000050789c in jrtplib::RTPPollThread::Thread() ()

#6 0x00000000004d9b67 in jthread::JThread::TheThread(void*) ()

#7 0x00007f466d9b8182 in start_thread (arg=0x7f4653fff700) at pthread_create.c:312

#8 0x00007f466cbd547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f466e052780 (LWP 10942)):

#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1 0x00007f466d9ba657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0

#2 0x00007f466d9ba480 in __GI___pthread_mutex_lock (mutex=0x1e34140) at ../nptl/pthread_mutex_lock.c:79

#3 0x00000000004d95be in jthread::JMutex::Lock() ()

#4 0x00000000004e9263 in jrtplib::RTPUDPv4Transmitter::AbortWait() ()

#5 0x0000000000507645 in jrtplib::RTPPollThread::Stop() ()

#6 0x000000000050748b in jrtplib::RTPPollThread::~RTPPollThread() ()

#7 0x0000000000507514 in jrtplib::RTPPollThread::~RTPPollThread() ()

#8 0x00000000004e051b in void jrtplib::RTPDelete<:rtppollthread>(jrtplib::RTPPollThread*, jrtplib::RTPMemoryManager*) ()

#9 0x00000000004db950 in jrtplib::RTPSession::BYEDestroy(jrtplib::RTPTime const&, void const*, unsigned long) ()

#10 0x00000000004213f1 in JtGb28181NvrAgent::HandleSdpReq_CloseVideo (this=0x1442060, CallID=...) at ../JtGb28181NvrAgent.cpp:4600

#11 0x0000000000416497 in JtGb28181NvrAgent::Thread (this=0x1442060) at ../JtGb28181NvrAgent.cpp:2669

#12 0x0000000000424675 in JtGb28181NvrAgent::StartWork (this=0x1442060, Config=0x0, RunMode=0) at ../JtGb28181NvrAgent.cpp:5333

#13 0x000000000040d22c in main (argc=1, argv=0x7fff33f8b738) at ../main.cpp:174

由上可知线程9171和10942线程卡住的地方,是再jrtplib里卡住了,函数如下

int RTPUDPv4Transmitter::AbortWait()

{

if (!init)

return ERR_RTP_UDPV4TRANS_NOTINIT;

MAINMUTEX_LOCK

if (!created)

{

MAINMUTEX_UNLOCK

return ERR_RTP_UDPV4TRANS_NOTCREATED;

}

if (!waitingfordata)

{

MAINMUTEX_UNLOCK

return ERR_RTP_UDPV4TRANS_NOTWAITING;

}

AbortWaitInternal();

MAINMUTEX_UNLOCK

return 0;

}

void RTPUDPv4Transmitter::AbortWaitInternal()

{

#if (defined(WIN32) || defined(_WIN32_WCE))

send(abortdesc[1],"*",1,0);

#else

if (write(abortdesc[1],"*",1))

{

// To get rid of __wur related compiler warnings

}

#endif // WIN32

}MAINMUTEX_LOCK是加锁,MAINMUTEX_UNLOCK是去锁,为什么会出现锁未去掉的情况呢,最后发现是在结束一个线程时使用了pthread_cancel函数,pthread_cancel可能在线程取消点就退出线程了,这里就是AbortWaitInternal函数的write调用处结束线程

,就导致MAINMUTEX_UNLOCK无法被调用,从而出现了死锁。

针对该问题做出代码上的修改,解决问题

通过这次bug,得出一个结论,除非你知晓代码及调用的每一个细节,否则不要轻易使用pthread_cancel来结束线程!!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值