gdb挂载调试release程序的死锁问题
目录
背景描述:
设备在长期运行中出现了死锁问题,刻意复现又比较困难,为了在第一时间排查定位问题,我翻阅了gdb的相关数据,并在网上查阅了相关资料,最终整理出一套在release程序运行中挂载gdb调试程序的办法。
编译带有debug信息的程序
编译带有debug信息的程序实际就是编译debug程序,在编译选项内增加-g3即可。
例如:
CFLAGS = –Wall –O3 –g3
CC = gcc
COMPILE = $(CC) $(CFLAGS) -c "$<" -o "$@"
不方便粘贴,只写了一部分Makefile的内容在这里,只为重点说明在编译选项内增加-g3选项。
注意:如果需要进行strip那么一定要在debug信息分离之后添加链接之前进行strip
将debug信息从release程序中分离出来
**命令:**objcopy –only-keep-debug 需要分离的程序名称 分离出来的文件名称
objcopy --only-keep-debug app_start.bin app_start.dbg
给release程序添加debug链接信息
命令: objcopy –add-gnu-debuglink=分离出来的debug文件名称 程序名称
objcopy --add-gnu-debuglink= app_start.dbg app_start.bin
查看带有debug链接信息的的release程序
命令: objdump -s -j .gnu_debuglink 程序名称
objdump -s -j .gnu_debuglink app_star
t.bin
app_start.bin: 文件格式 elf32-littlearm
Contents of section .gnu_debuglink:
0000 6e76732e 64626700 e72cfdfd app_start.dbg..,..
查看含有debug信息的文件内的debug内容
命令: objdump -h
objdump -h app_start.bin | grep debug
22 .gnu_debuglink 0000000c 00000000 00000000 0022f691 2**0
objdump -h app_start.dbg | grep debug
22 .debug_aranges 00009f20 00000000 00000000 000001b8 2**3
23 .debug_info 00338ff9 00000000 00000000 0000a0d8 2**0
24 .debug_abbrev 0001bbcb 00000000 00000000 003430d1 2**0
25 .debug_line 000c3097 00000000 00000000 0035ec9c 2**0
26 .debug_frame 0002f950 00000000 00000000 00421d34 2**2
27 .debug_str 0009ac9b 00000000 00000000 00451684 2**0
28 .debug_loc 0017505f 00000000 00000000 004ec31f 2**0
29 .debug_ranges 00022368 00000000 00000000 0066137e 2**0
30 .debug_macro 00059999 00000000 00000000 006836e6 2**0
使用办法
当设备出现死锁时,将app_start.dbg放入和执行程序相同路径下,通过ps命令查看执行程序的进行号。通过命令关闭watchdog。
使用gdb挂载debug信息
命令: ./gdb –p 进程号
输入info thread指令即可查看当前线程的运行情况,在其中搜索__lll_lock_wait,然后查看对应的信息即可定位死锁关系。
例如
(gdb) info threads
[New Thread 0x68a134a0 (LWP 25308)]
Id Target Id Frame
138 Thread 0x68a134a0 (LWP 25308) "app_start.bin" 0xb1f220ec in clone () from /lib/libc.so.0
…………
34 Thread 0x63f634a0 (LWP 6718) "app_start.bin" 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
33 Thread 0x637634a0 (LWP 6719) "app_start.bin" 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
32 Thread 0x62f634a0 (LWP 6720) "app_start.bin" 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
31 Thread 0x627634a0 (LWP 6721) "app_start.bin" 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
…………
* 7 Thread 0xa26b84a0 (LWP 22539) "ADEC_SendAoProc" 0xb1edfffc in nanosleep () from /lib/libc.so.0
…………
1 Thread 0xb6fdb000 (LWP 1547) "app_start.bin" 0xb1edfffc in nanosleep () from /lib/libc.so.0
(gdb) thread 31
[Switching to thread 31 (Thread 0x627634a0 (LWP 6721))]
#0 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) where
#0 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
#1 0xb6fc0b04 in pthread_mutex_lock () from /lib/libpthread.so.0
#2 0x001d4a9c in TD_VCA_Ctrl_Lock () at src/vca/vca_alarm.c:3492
#3 0x0014a510 in TD_Scene_Suspend (_iLock=1, _iCtrlFlag=0, _iStatus=0) at src/dvr/Scene.c:554
#4 TD_Scene_SetSuspendvca (_iChn=_iChn@entry=0, _iStatus=_iStatus@entry=0, _iSock=<optimized out>)
at src/dvr/Scene.c:1857
#5 0x000b12ec in td_cmd_setframerate (chn=chn@entry=0, framerate=framerate@entry=0x62755358 "15")
at src/dvr/CmdExecute.c:28828
#6 0x000b3430 in td_cmd_setInputNorm (mode=1) at src/dvr/CmdExecute.c:27843
#7 0x000bb998 in NetParaSet (_u32IP=<optimized out>, _u32IP@entry=101,
_cSetMsg=_cSetMsg@entry=0x62760260 "PARASET\tVIDEOMODE\t1", _cPara=_cPara@entry=0x62760800 "",
_SendMsg=<optimized out>) at src/dvr/CmdExecute.c:22373
#8 0x00050050 in cbkNetMsgEvent (_u8ErrorType=<optimized out>, _u32ErrorMsg=101, _cMsg=<optimized out>,
_u32IpAddress=673783306, _u16Port=50501) at src/dvr/application.c:835
#9 0x0012b658 in cbkRecvData (
_cBuf=_cBuf@entry=0x2581400 "10.30.41.41\tIP\tPARASET\tVIDEOMODE\t1\n\n\n\361\365\352\365\071\005\024",
_iBufLength=_iBufLength@entry=37, _u32IpAddress=673783306, _u16Port=<optimized out>, _socket=101)
at src/dvr/NetServer.c:988
#10 0x00137ffc in TcpServerRecvCmd (
_cBuf=0x2581400 "10.30.41.41\tIP\tPARASET\tVIDEOMODE\t1\n\n\n\361\365\352\365\071\005\024", _iBufLength=37,
_pClient=0x2580e98) at src/dvr/NetServer.c:3936
#11 0x00163c24 in TcpServer_ParseStreamData (_pClient=0x2580e98, _cRecvBuffer=<optimized out>,
_u32RecvLength=<optimized out>) at src/dvr/TcpServer.c:2479
#12 0x00164308 in PthreadTcpClientRecv (_tsThis=0x2580e98) at src/dvr/TcpServer.c:2029
#13 0xb6fbeb3c in start_thread () from /lib/libpthread.so.0
#14 0xb1f2210c in clone () from /lib/libc.so.0
#15 0xb1f2210c in clone () from /lib/libc.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 32
[Switching to thread 32 (Thread 0x62f634a0 (LWP 6720))]
#0 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) where
#0 0xb6fb7448 in __lll_lock_wait () from /lib/libpthread.so.0
#1 0xb6fc0b04 in pthread_mutex_lock () from /lib/libpthread.so.0
#2 0x0008ea2c in td_cmd_setvenctype (_pcChn=_pcChn@entry=0x2231d8 "0", _pcType=<optimized out>,
_cIp=_cIp@entry=0x62f55670 "10.30.41.40") at src/dvr/CmdExecute.c:35083
#3 0x000c0d68 in NetParaSet (_u32IP=<optimized out>, _u32IP@entry=100,
_cSetMsg=_cSetMsg@entry=0x62f60260 "PARASET\tVENCTYPE\t0\t1", _cPara=_cPara@entry=0x62f60800 "",
_SendMsg=0x130e48 <NetServer_SendStringToAllClient>) at src/dvr/CmdExecute.c:23497
#4 0x00050050 in cbkNetMsgEvent (_u8ErrorType=<optimized out>, _u32ErrorMsg=100, _cMsg=<optimized out>,
_u32IpAddress=673783306, _u16Port=50500) at src/dvr/application.c:835
#5 0x0012b658 in cbkRecvData (
_cBuf=_cBuf@entry=0x2582678 "10.30.41.41\tIP\tPARASET\tVENCTYPE\t0\t1\n\n\n\361\365\352\365\261\002\024",
_iBufLength=_iBufLength@entry=38, _u32IpAddress=673783306, _u16Port=<optimized out>, _socket=100)
at src/dvr/NetServer.c:988
#6 0x00137ffc in TcpServerRecvCmd (
_cBuf=0x2582678 "10.30.41.41\tIP\tPARASET\tVENCTYPE\t0\t1\n\n\n\361\365\352\365\261\002\024", _iBufLength=38,
_pClient=0x2582378) at src/dvr/NetServer.c:3936
#7 0x00163c24 in TcpServer_ParseStreamData (_pClient=0x2582378, _cRecvBuffer=<optimized out>,
_u32RecvLength=<optimized out>) at src/dvr/TcpServer.c:2479
#8 0x00164308 in PthreadTcpClientRecv (_tsThis=0x2582378) at src/dvr/TcpServer.c:2029
#9 0xb6fbeb3c in start_thread () from /lib/libpthread.so.0
#10 0xb1f2210c in clone () from /lib/libc.so.0
#11 0xb1f2210c in clone () from /lib/libc.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
配合代码即可排查定位到死锁位置及原因。
后记:上一篇博客已经是一年前的事了,这一年来生活、工作上都经历了很多,既学到了很多知识,也成长了很多,后续我坚持更新博客,多写一些技术型博客,里面既有易错的基础知识,也有调试程序的一些常用工具及处理办法。