其实Auto Dump的原理很简单,对于unmanaged code而言,程序崩溃后,OS会去根据注册表HKLM/Software/Microsoft/Windows NT/Current Version/AeDebug(如果OS是64位,运行的Debugger是32位的,则还需设置HKEY_LOCAL_MACHINE/SOFTWARE/Wow6432Node/Microsoft/Windows NT/CurrentVersion/AeDebug)中Debugger的值,并替换其中相应的转义符(如"%ld"代表进程Id),随后运行此命令行。在AeDebug下设置Auto键值为1,则可使其自动运行。
而对于managed code而言,对应的Debugger选项则为HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/.NETFramework(64位OS下32位Debugger,还需设置
HKEY_LOCAL_MACHINE/SOFTWARE/Wow6432Node/Microsoft/.NETFramework)。对于CLR程序的crash处理模式较多,通过DbgJITDebugLaunchSetting键值设置,具体值的意义可参考Enabling JIT-attach Debugging。
类似的,通过简单的修改Debugger的值,也可以实现在进程crash时debugger的auto attach。
一些有用的debugger命令行:
"c:/debuggers/x86/cdb.exe" -pv -p %ld -c ".dump /u /ma c:/crash_dumps/crash.dmp;.kill;qd"
指定只抓取特定程序的Dump
对于Vista / Server 2008以后的unmanaged code,将Auto的值设为0;对于CLR版本4以上的managed code,将DbgJITDebugLaunchSetting的值设为0。然后,通过在HKLM/Software/Microsoft/Windows/Windows Error Reporting/DebugApplications下建立以程序命名(如"test.exe")的DWORD键值,其值设为1,使之运用在此特定程序上。
Auto Dump不能确保抓住现场
由于有timing issue,当debugger attach上进程的时候,thread可能已经移位了。以下是CLR team的一位developer的原话:
...unfortunately, there is an inherent race condition between the crashing thread, that then lands in the OS, which then launches Watson, which hurries to suspend the process, and the many other threads in the process that continue running during this time – by the time the dump is collected, whichever other thread was somehow involved in the crash is now in a different place...