Debugging Memory Errors in C/C++

本文介绍了几种调试C/C++程序中内存错误的有效方法,包括使用特殊模式标记已释放内存、插入空闲区域避免空间依赖等。还讨论了如何利用硬件观察点定位具体覆盖地址,最终找到错误源头。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Debugging Memory Errors in C/C++

http://scottmcpeak.com/memory-errors/

This page describes a few key techniques I've learned about howto debug programs that are suspected of containing memory errors.Principally, this includes using memory after it has been freed,and writing beyond the end of an array. Memory leaks areconsidered briefly at the end.

It's of course rather presumptuous to even write these up, since somuch has already been written. I'm not intending to write the be-alland end-all article, just to write up a few of the techniques I usesince I recently had the opportunity to help a friend debug such anerror. There's also some links at the end to other resources.

Note that I'm only interested here in memory errors that trashpart of the heap. Overwriting the stack may be a cracker's favoritetechnique, but when it happens in front of the programmer it'susually very easy to track down.

Why are memory errors hard to debug?

The first thing to understand about memory errors is why they'redifferent from other bugs. I claim the main reason they are harderto debug is that they are fragile. By fragile, I mean thebug will often only show up under certain conditions, and thatattempts to isolate the bug by changing the program or its inputoften mask its effects. Since the programmer is then forced tofind the needle in the haystack, and cannot use techniques to cutdown on the size of the haystack, locating the cause of the problemis very difficult.

Consequently, the first priority when tracking down suspected memoryerrors is to make the bug more robust. There is abug in your code, but you need to do something so that the bug'seffects cannot be masked by other actions of the program.

Making the bug more robust

I know of two main techniques for reducing the fragility of a memorybug:

  • Don't re-use memory.
  • Put empty space between memory blocks.

Why do these techniques help? First, by not re-using memory, we caneliminate temporal dependencies between the bug and thesurrounding program. That is, if memory is not re-used, then it nolonger matters in what order the relevant blocks are allocated anddeallocated.

Second, by putting empty space between blocks, overwriting (orunderwriting) past the end of one block won't corrupt another. Thus,we break spatial dependencies involving the bug. The spacebetween the bugs should be filled with a known value, and the spaceshould be periodically checked (at least when free is called onthat block) to see if the known value has beenchanged.

With temporal and spatial dependencies reduced, it's less likely thata change to the program or its input will disturb the evidence of thebug's presence.

Of course, your machine must have enough spare memory to run theexperiment. But, by making the bug more robust, we can now cut downon the input size! Thus in the end using more space in the short termcan lead to using less space in the final, minimized input test case.

The above two techniques are easily implemented in any debug heapimplementation. I've modified Doug Lea's malloc to implementthe features; my modified version is here: malloc.c,ckheap.h. To compile with the debug featuresdescribed, set the preprocessor variables DEBUG andDEBUG_HEAP. But of course you can use any implementation,and the debug versions can simply be wrappers around the real malloc.

Using hardware watchpoints

Intel-compatible x86 processors include debug registers capable ofwatching up to four addresses. Whenever a read or write to any ofthe watched addresses happens, the program traps, and the debuggergets control. The debug registers offer a powerful way to find outwhat line of code is overwriting a given byte, once you know whichbyte is being overwritten.

Ingdb,the notation for using hardware watchpoints is a littleodd, because gdb likes to think of its input as a C expression.If you want to stop when address 0xABCDEF is accessed, then atthe gdb prompt type

  (gdb) watch *((int*)0xABCDEF)

One difficulty is that you can't begin watching an address untilthe memory it refers to has been mapped (requested from the operatingsystem for use by the program). The usual solution is to step throughthe program at a rather coarse granularity (skipping over most functioncalls) until you find a point in time where the address is mapped buthas not yet been trashed. Add the watchpoint, then let the program rununtil the address is accessed.

An example

Suppose I have a program with a suspected memory error. I compileit with the debug malloc.c, and when I runit I see:

  $ ./tmalloc
  trashed 1 bytes
  tmalloc: malloc.c:1591: checkZones: Assertion `!"right allocated zone trashed"' failed.
  Aborted

I first run the program in the debugger to find the offending address:

  (gdb) run
  Starting program: /home/scott/wrk/cplr/smbase/tmalloc
  trashed 1 bytes
  tmalloc: malloc.c:1591: checkZones: Assertion `!"right allocated zone trashed"' failed.

  Program received signal SIGABRT, Aborted.
  0x400539f1 in __kill () from /lib/libc.so.6
  (gdb) up
  #1  0x400536d4 in raise (sig=6) at ../sysdeps/posix/raise.c:27
  27      ../sysdeps/posix/raise.c: No such file or directory.
  (gdb) up
  #2  0x40054e31 in abort () at ../sysdeps/generic/abort.c:88
  88      ../sysdeps/generic/abort.c: No such file or directory.
  (gdb) up
  #3  0x4004dfd2 in __assert_fail () at assert.c:60
  60      assert.c: No such file or directory.
  (gdb) up
  #4  0x8048d55 in checkZones (p=0x8050838 "\016\001", bytes=270)
      at malloc.c:1591
  (gdb) print p[bytes-1-i]
  $1 = 7 '\a'                 <----- trashed! should be 0xAA
  (gdb) print p+bytes-1-i
  $2 = (unsigned char *) 0x80508c6 "\a", '\252' <repeats 127 times>
  (gdb)                  ^^^^^^^^^
                         this is the trashed address

Now I restart the program and attempt to set a hardware watchpoint:

  (gdb) break main
  Breakpoint 1 at 0x8048b91: file tmalloc.c, line 81.
  (gdb) run
  The program being debugged has been started already.
  Start it from the beginning? (y or n) y

  Starting program: /home/scott/wrk/cplr/smbase/tmalloc

  Breakpoint 1, main () at tmalloc.c:81
  (gdb) watch *((int*)0x80508c6)
  Cannot access memory at address 0x80508c6
  (gdb)

Ok, the memory isn't mapped yet. Single-stepping through main afew times, I find a place where I can insert the watchpoint butthe memory in question hasn't yet been trashed. When I then continuethe program, the debugger next stops at the bug.

  (gdb) watch *((int*)0x80508c6)
  Hardware watchpoint 3: *(int *) 134547654
  (gdb) c
  Continuing.
  Hardware watchpoint 3: *(int *) 134547654

  Old value = -1431655766
  New value = -1431655929
  offEnd () at tmalloc.c:33
  (gdb) print /x -1431655766
  $1 = 0xaaaaaaaa              <--- what it should be
  (gdb) print /x -1431655929
  $2 = 0xaaaaaa07              <--- what it became after trashing
  (gdb) list
  28
  29      void offEnd()
  30      {
  31        char *p = malloc(10);
  32        p[10] = 7;    // oops       <--- the bug
  33        free(p);
  34      }
  35
  36      void offEndCheck()
  37      {
  (gdb)

In this small program the bug would have been obvious upon inspection,but the technique of course generalizes to cases that are much morecomplicated.

Dangling references

As mentioned above, a debug heap shouldn't re-use memory. Going onestep further, my debug malloc.c overwritesfree()'d memory with another known pattern (but does not actually freeit). Then, if the program continues to use the memory the mistakewill become clear, especially if it tries to interpret the values itfinds as pointers (they'll segfault). Double-deallocation is alsoeasy to identify with this scheme.

Memory leaks

I usually debug memory leaks by printing statistics about calls tomalloc and free before and after certain sections of code. If thereare more calls to malloc, but the code isn't supposed to be creatinglong-lived data, then that points to a potential problem. Thisdoesn't easily generalize to long-running programs, but if the programcan be broken into units and the leak properties of each unit checkedin isolation, most leaks can be found relatively easily.

Conclusion

The C and C++ languages are much-maligned for lack of memory safety,but too often this is seen as a greater problem than it really is(setting security issues aside for the moment). Debugging memoryrequires a different approach than debugging other kinds of errors,but with a little practice they can actually be easier and faster tofind, simply because the same techniques (and tools!) can be used overand over.

Some links

I'm not the first or last to write about methods for debugging memoryerrors. Here are some links to other people who also aren't thefirst or last either (actually only the first link really matches thisdescription..).

  • Debugging Tools for Dynamic Storage Allocation and Memory Management: Ben Zorn's long list of tools people have written to help debug memory errors.
  • Doug Lea's malloc: Doug Lea's implementation of malloc.
  • malloc.c: My modified version of Doug Lea's malloc, version 2.7.0. I've added:
    • -DDEBUG_HEAP: don't re-use memory, put empty zones on both sides of allocated space, overwrite deallocated space
    • Statistics to track the number of calls to malloc and free.
    • A heap walker interface.
    • -DTRACE_MALLOC_CALLS: print a message to stderr on every malloc and free
  • The above malloc.c also needs the header ckheap.h. That's an oversight I plan to correct, but in the meantime this should be enough to compile malloc.c.
  • gdb: The GNU debugger. The de-facto standard on Linux, for better or worse.
  • Rational: The makers of Purify, one of the best-known tools for finding memory errors. Purify doesn't require recompiling the program, which certainly has its advantages, but as such it is limited in the ways it can make memory bugs more robust. I think sometimes people reach for a heaviweight solution like Purify when a simple debug heap would be faster and easier.
  • CCured: I'd be remiss if I didn't mention CCured, a research project I've done quite a bit of work on. CCured instruments the entire program so it can catch a wide variety of bugs, in a way that is sound: if CCured does not report a problem, then no problem occurred during that run of the program. I can't recommend it as the first solution to reach for during debugging, since it takes a fair bit of time and effort to get a program working under CCured. But in the long run, if you can use CCured, it provides a level of assurance well beyond that of any other current technique.

Valid HTML 4.01!

<think>我们正在处理一个GDB(GNU调试器)的错误报告。错误信息显示: "virtual memory exhausted: can't allocate 1073741824 bytes"(虚拟内存耗尽:无法分配1073741824字节) 并且出现了两次,随后GDB以退出码3退出。 这个错误通常意味着GDB在尝试分配大约1GB(1073741824字节)的内存时失败,因为系统的虚拟内存(包括物理内存和交换空间)不足。这可能是由于系统内存资源不足,或者GDB试图分配一个过大的内存块而系统无法满足。 可能的原因: 1. 系统内存不足:运行GDB的机器可能没有足够的可用内存(包括交换空间)。 2. GDB本身的内存泄漏或bug:GDB在处理特定调试任务时可能会错误地尝试分配过多内存。 3. 被调试程序的内存占用过大:如果被调试的程序占用了大量内存,GDB在尝试读取其状态(如大型数组、核心转储等)时也可能需要大量内存。 4. 文件或数据过大:例如,尝试加载一个非常大的核心文件(core dump)或执行文件。 根据错误信息,GDB在内部函数`utils.c`的第671行遇到了这个错误,并且发生了两次,然后GDB建议退出调试会话并创建核心文件(但显然创建核心文件也可能失败)。 退出码3(0x3)通常表示内部错误。 如何解决: 1. 检查系统内存:使用`free -h`(Linux)等命令查看内存和交换空间使用情况。如果内存不足,可以尝试关闭其他程序,增加交换空间,或者增加物理内存。 2. 简化调试场景:如果被调试的程序很大,尝试使用更小的测试用例,或者避免在GDB中操作大型数据结构(例如,不要尝试打印一个巨大的数组)。 3. 更新GDB:这个错误可能是GDB的已知问题,升级到更新的版本(高于10.1)可能会解决。 4. 使用其他调试工具:如果问题持续,可以尝试其他调试器(如LLDB)或方法。 注意:错误信息中提到了两次相同的错误,第二次是在尝试创建核心文件时发生的,同样因为内存不足而失败。 因此,用户遇到的问题是GDB因为系统虚拟内存不足而崩溃。 §§相关问题§§ 1. 如何检查Linux系统的当前内存和交换空间使用情况? 2. 在遇到GDB内存不足错误时,有哪些方法可以增加系统的可用虚拟内存? 3. 有哪些常见的GDB替代调试器可以用于调试C/C++程序? 4. 如何避免在调试大型程序时触发GDB的内存分配错误?</think>### 问题分析: 您遇到的GDB错误 `virtual memory exhausted: can't allocate 1073741824 bytes` 表明GDB在尝试分配1GB内存时失败,根本原因是**系统虚拟内存资源耗尽**。具体原因和解决方案如下: 1. **直接原因** GDB在处理大型程序(如海量数据/复杂调试信息)时请求1GB连续内存,但系统无法满足: - 物理内存(RAM)不足 - 交换空间(Swap)不足或未启用 - 系统内存碎片化导致无法分配连续大内存块 2. **深层诱因** - 被调试程序占用大量内存(如处理大型数据集) - GDB自身内存泄漏(特别是旧版本,建议升级至GDB 12+) - 尝试加载超大型核心转储(core dump)文件 - 系统资源限制(如ulimit设置) 3. **解决方案** **立即缓解**: ```bash # 增加交换空间(临时) sudo fallocate -l 2G /swapfile && sudo chmod 600 /swapfile sudo mkswap /swapfile && sudo swapon /swapfile # 检查内存限制 ulimit -v unlimited # 解除虚拟内存限制 ``` **根本解决**: - 升级GDB:`sudo apt install gdb` (最新版修复了部分内存泄漏) - 优化调试对象:减少程序初始数据量 - 增加物理内存或永久交换空间 - 使用轻量级调试器替代(如LLDB) 4. **错误退出码解读** `exit code 3 (0x3)` 对应GDB内部错误`GDB_ERR_INTERNAL`,通常由资源分配失败触发。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值