1. 起因
设计一个自定义类模拟标准库类型 string,定义重载的标准输入操作符“>>”函数时,需要对内存进行分配。
为优化效率,我采用按需分配内存块的策略。即一开始时划分大小为 N+1 字节的内存块(其中的 1为结尾符“\n”),其地址赋给某字符指针,然后逐一把标准输入设备(istream)输入的字符复制到字符指针所指向的内存空间;当输入的字符数量大于已划分的内存空间时,就重新分配内存空间,大小在原来的基础上增加 N字节,然后把备份的旧字符数组拷贝到新分配的内存空间。这样就可以继续复制从输入流取出的字符了。
根据以上策略,得到如下代码:
编译成功,问题出现在运行时,如下图所示“Buffer is too small”,代码执行停在“strcpy_s(temp,len + 1, str);”,如下打黄色 mark部分。
std::istream& operator>>(std::istream& is, String & s) {
unsigned int is_count= 0;
char* str = new char[11];
//memset(str, 0, 11 * sizeof(char));
size_t len =10;
char buf = is.rdbuf()->sgetc();
while (buf != ' '&& buf != '\n') {
++is_count;
if (is_count > len) {
char* temp = new char[len + 1];
//memset(temp, 0, (len + 1) * sizeof(char));
strcpy_s(temp, len + 1, str);
delete[] str;
len =10 * (len / 10 + 1);
str = new char[len + 1];
//memset(str, 0, (len + 1) * sizeof(char));
strcpy_s(str,len - 10 + 1, temp);
delete[] temp;
}
str[is_count- 1] = buf;
buf = is.rdbuf()->snextc();
}
Stringret(str);
s = ret;
delete[] str;
return is;
}
2. 跟踪分析
运行如下测试代码,我们会理所当然地认为输出的一定是 10,结果却是24。
char* p;
p = new char[10];
cout <<strlen(p) << endl;
delete p;
于是就会有个大大的“?”,为什么会是24呢?分配其他大小的字符数组又会是什么情况呢?
我们进一步考察下面程序,得到结果
for (int i = 0; i< 40; i+=1) {
char* p = new char[i];
cout<< "alloc " << i<<" bytes, strlen= "<< strlen(p) << endl;
delete [] p;
}
可见 strlen() 的结果并非我们 new的数组长度,是 strlen() 还是 new的行为出乎我们意料?标准库中 strlen 的定义是汇编语言写的!还是从较容易的 new入手好了。
New的标准库定义中决定内存块大小的代码如下,其中参数 nSize就是我们在指令“new char[]”的“[]”中设置的数字。
blockSize = sizeof(_CrtMemBlockHeader) + nSize + nNoMansLandSize;
……
pHead =(_CrtMemBlockHeader *)_heap_alloc_base(blockSize);
……
/* fill in gap before and after real block */
memset((void *)pHead->gap, _bNoMansLandFill,nNoMansLandSize);
memset((void *)(pbData(pHead) + nSize), _bNoMansLandFill,nNoMansLandSize);
/* fill data with silly value (but non-zero) */
memset((void *)pbData(pHead), _bCleanLandFill, nSize);
根据 msdn 如下帮助描述,生成的内存块还包含debug信息,这些信息与 CrtMemBlockHeader和nNoMansLandSize 相关。CrtMemBlockHeader的数据结构,同样可以在 msdn 如下描述中看到。
http://msdn.microsoft.com/en-us/library/aa270812(VS.60).aspx
Memory Management and the Debug Heap
“The debug versions of the heapfunctions call the standard or base versions used in release builds. When yourequest a memory block, the debug heap manager allocates from the base heap aslightly larger block of memory than requested and returns a pointer to yourportion of that block. For example, suppose your application contains the call:malloc(10 )
. In a release build, malloc would call the base heap allocation routine requesting anallocation of 10 bytes. In a debug build, however, malloc would call _malloc_dbg, which would then call the base heap allocation routinerequesting an allocation of 10 bytes plus approximately 36 bytes of additionalmemory. All the resulting memory blocks in the debug heap are connected in asingle linked list, ordered according to when they were allocated:
The additional memory allocated by thedebug heap routines is used for bookkeeping information, for pointers that linkdebug memory blocks together, and for small buffers on either side of your datato catch overwrites of the allocated region.”
Currently,the block header structure used to store the debug heap’s bookkeepinginformation is declared as follows in the DBGINT.H header file:
typedef struct_CrtMemBlockHeader
{
// Pointer to the blockallocated just before this one:
struct _CrtMemBlockHeader *pBlockHeaderNext;
// Pointer to the blockallocated just after this one:
struct _CrtMemBlockHeader *pBlockHeaderPrev;
char *szFileName; // File name
int nLine; // Line number
size_t nDataSize; // Size of user block
int nBlockUse; // Type of block
long lRequest; // Allocation number
// Buffer just before(lower than) the user's memory:
unsigned char gap[nNoMansLandSize];
} _CrtMemBlockHeader;
/* In an actual memoryblock in the debug heap,
* this structure is followed by:
* unsigned char data[nDataSize];
* unsigned char anotherGap[nNoMansLandSize];
*/
3. 解决之道
幸运的是,仍然可以找到一些能按我们的理解“正常”运行的代码
char* p;
p = new char[6];
char* q = "aaaaa";
strcpy_s(p, 6,q);
cout <<strlen(p) << endl;
delete p;
这时 p指向含5个字符的内存,strlen()输出为5,把 5改成其他数字,结果也会是那个我们所修改的数字。
于是,得到一个合理的推理假设,新分配的字符串内存空间,只有经过赋值或者初始化,才能“正常”的为 strlen()所用。
一个新的问题出来了,如何正确的初始化 new分配的字符串空间呢?写 p = 0 显然是不对的,那样指针 p就指向空了;我们希望的是,p 所指向的内存都存放“空的字符”。同样幸运的是,有“memset”这个函数。
运行如下测试代码,我们得到了想要的初始化效果,即 p指向的字符串数变成了0个。根据网络资料,“memset(p, 0, 6 *sizeof(char));”的作用是给 p所指向的内存空间赋值,指令中的“0”等同于“\n”。而 strlen()的运算方法是遇“\n”即止,结果为 0就不奇怪了。
char* p;
p = new char[6];
cout <<strlen(p) << endl;
memset(p, 0, 6* sizeof(char));
cout <<strlen(p) << endl;
delete p;
4. 最终方案
将memset的注释去掉就可以了。
std::istream& operator>>(std::istream& is, String & s) {
unsigned int is_count= 0;
char* str = new char[11];
memset(str, 0, 11 * sizeof(char));
size_t len =10;
char buf = is.rdbuf()->sgetc();
while (buf != ' '&& buf != '\n') {
++is_count;
if (is_count > len) {
char* temp = new char[len + 1];
memset(temp, 0, (len + 1) * sizeof(char));
strcpy_s(temp, len + 1, str);
delete[] str;
len =10 * (len / 10 + 1);
str = new char[len + 1];
memset(str, 0, (len + 1) * sizeof(char));
strcpy_s(str,len - 10 + 1, temp);
delete[] temp;
}
str[is_count- 1] = buf;
buf =is.rdbuf()->snextc();
}
Stringret(str);
s = ret;
delete[] str;
return is;
}