上次我们发现valarray比一般的C循环慢。当时是在release mode下比较的。如果读者在debug mode下面编译执行,你可能会发现valarray的运行速度可能还要降低10倍以上。这是为什么呢?
深入研究valarray的实现,里面也没有任何的调试代码,也不依赖于任何其他的东西,难道就是因为采用了类而导致的这个问题吗?如果这样岂不问题太大了?
在调试模式下设一个端点,然后我们看汇编码,就会发现所有的inline函数都被编译成了函数调用,也就是说inline指令没有起任何作用。
在release mode下干同样的事情,看汇编码,但是不会有原程序的对应。可以先到release mode 的 setting里察看C++选项,加入debugging information (只要不选program data base for edit and continue), release mode的编译和运行都不会有任何影响,同时还可以看到对应的源代码。这时候我们可以清楚地看到inline函数确实被嵌入而不再是函数调用了。
原因就在于此:函数调用的开销在这个程序中极大的影响了性能。因为每一次的函数调用包括了:
参数的拷贝(写入堆栈,从堆栈读取,函数返回时还要从堆栈恢复
环境的保存与恢复,包括寄存器,返回地址等等。
验证这个结论很简单:在debug mode下,加入/Ob1编译选项(这个选项打开了inline功能),强迫编译器把inline函数作为inline处理,此时debugging information也不能选program database for edit and continue, 从而也无法进入相关的函数了。这时候运行,我们会发现速度和release mode一样了。
(NOTE: the test & comparison are done using MSVC6.0)
这个例子给我们几点启发:
1。VC的debug & release模式,性能可能有上10倍的差别。不可掉以轻心阿。
2。class的inline 函数可能是影响性能的重要因素。要测试是否这个是主要因素。
3。debug下也可以优化,release下也可以debug
4。编译器的编译选项控制程序员还是有必要了解一二的。
VC2008的默认选项:(我从网上找来的)
· /W1, Setswarning level to 1
· /Ze, Turns onmicrosoft extensions to C and C++
· /ZB64, Setsmaximum integral types to 64bits (Undocumented)
· /Zp8, Setsstruct or class packing to 8 bytes
· /Gs4096, Causesstack probe code to be generated for functions with locals greater than 4096KB
· /Ot, Optimisefor time rather than space
· /FoHello.obj,Name for object file output
· /Fdvc90.pdb,Name for pdb file generated by compiler
· /GS, Generatestack frame cookies
· /Ob0, Disableinline expansion
· /MT, Link tomultithreaded static crt library
· /ZM, Sets thecompiler to not completely finish processing a compilation unit (cpp/c file)before starting to process the next (undocumented)
编译选项:
C/C++ COMPILER OPTIONS
-OPTIMIZATION-optimization
/O1 minimize space /Op[-] improve floating-pt consistency
/O2 maximize speed /Os favor code space
/Oa assume no aliasing /Ot favor code speed
/Ob<n> inline expansion (default n=0) /Ow assume cross-function aliasing
/Od disable optimizations (default) /Ox maximum opts. (/Ogityb1 /Gs)
/Og enable global optimization /Oy[-] enable frame pointer omission
/Oi enable intrinsic functions
-CODE GENERATION-Codegeneration
/G3 optimize for 80386 /Gy separate functions for linker
/G4 optimize for 80486 /Ge force stack checking for all funcs
/G5 optimize for Pentium /Gs[num] disable stack checking calls
/G6 optimize for Pentium Pro /Gh enable hook function call
/GB optimize for blended model (default) /GR[-]enable C++ RTTI
/Gd __cdecl calling convention /GX[-] enable C++ EH (same as /EHsc)
/Gr __fastcall calling convention /Gi[-] enable incremental compilation
/Gz __stdcall calling convention /Gm[-] enable minimal rebuild
/GA optimize for Windows Application /EHsenable synchronous C++ EH
/GD optimize for Windows DLL /EHaenable asynchronous C++ EH
/Gf enable string pooling /EHc extern "C" defaults tonothrow
/GF enable read-only string pooling /QIfdiv[-] enable Pentium FDIV fix
/GZ enable runtime debug checks /QI0f[-] enable Pentium 0x0f fix
-OUTPUT FILES-Outputfile
/Fa[file] name assembly listing file /Fo<file>name object file
/FA[sc] configure assembly listing /Fp<file> name precompiledheader file
/Fd[file] name .PDB file /Fr[file] name source browser file
/Fe<file> name executable file /FR[file] name extended .SBR file
/Fm[file] name map file
-PREPROCESSOR-Preprocessorfacilities
/C don’t strip comments /FI<file> name forced includefile
/D<name>{=|#}<text>define macro /U<name>remove predefined macro
/E preprocess to stdout /u remove all predefined macros
/EP preprocess to stdout, no #line /I<dir> add to include searchpath
/P preprocess to file /X ignore "standard places"
-LANGUAGE-language
/Zi enable debugging information /Zl omit default library name in .OBJ
/ZI enable Edit and Continue debuginfo /Zg generate function prototypes
/Z7 enable old-style debug info /Zs syntax check only
/Zd line number debugging info only /vd{0|1} disable/enable vtordisp
/Zp[n] pack structs on n-byteboundary /vm<x> type ofpointers to members
/Za disable extensions (implies /Op) /noBool disable "bool"keyword
/Ze enable extensions (default)
-MISCELLANEOUS-miscellaneous
/?, /help print this help message /V<string> set version string
/c compile only, no link /w disable all warnings
/H<num> max external namelength /W<n> set warninglevel (default n=1)
/J default char type is unsigned /WX treat warnings as errors
/nologo suppress copyright message /Yc[file] create .PCH file
/Tc<source file> compile fileas .c /Yd put debug info in every.OBJ
/Tp<source file> compile fileas .cpp /Yu[file] use .PCH file
/TC compile all files as .c /YX[file] automatic .PCH
/TP compile all files as .cpp /Zm<n> max memory alloc (% ofdefault)
-LINKING-link
/MD link with MSVCRT.LIB /MDd link with MSVCRTD.LIB debug lib
/ML link with LIBC.LIB /MLd link with LIBCD.LIB debug lib
/MT link with LIBCMT.LIB /MTd link with LIBCMTD.LIB debug lib
/LD Create .DLL /F<num> set stack size
/LDd Create .DLL debug libary /link [linker options and libraries]