程序实例(lock.h):
static __inline__ int
tas(volatile slock_t *lock)
{
register volatile slock_t *_l = lock;
register int _res;
register int _tmp;
__asm__ __volatile__(
" .set push \n"
" .set mips2 \n"
" .set noreorder \n"
" .set nomacro \n"
" ll %0, %2 \n"
" or %1, %0, 1 \n"
" sc %1, %2 \n"
" xori %1, 1 \n"
" or %0, %0, %1 \n"
" sync \n"
" .set pop "
: "=&r" (_res), "=&r" (_tmp), "+R" (*_l)
: /* no inputs */
: "memory");
return _res;
}
代码本身来看已经没有优化空间。MIPS架构使用ll/sc指令完成原子操作。
但是gcc默认优化等级是1,导致使用上面函数的__inline__声明没有起作用,这就导致会多出更多的指令。可以在编译时添加“-O3”来优化代码,使__inlinie__起效果来减少运行时的指令数。但这个tas如果调用频繁,“-O3”可能会膨胀你的代码量。代码量的膨胀必然导致程序运行过程中cache命中率。
这需要我们要全盘评估整个软件项目里使用tas的频率和代码量,来决定是否使用“-O3”参数。
附:gcc手册中对-O3的定义:
-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to ‘-O’, this option increases both compilation time and the performance of the generated code.
‘-O2’ turns on all optimization flags specified by ‘-O’. It also turns on the
following optimization flags:
-fthread-jumps
-falign-functions -falign-jumps
-falign-loops -falign-labels
-fcaller-saves
-fcrossjumping
-fcse-follow-jumps -fcse-skip-blocks
-fdelete-null-pointer-checks
-fdevirtualize -fdevirtualize-speculatively
-fexpensive-optimizations
-fgcse -fgcse-lm
-fhoist-adjacent-loads
-finline-small-functions
-findirect-inlining
-fipa-sra
-fisolate-erroneous-paths-dereference
-foptimize-sibling-calls
-fpartial-inlining
-fpeephole2
-freorder-blocks -freorder-functions
-frerun-cse-after-loop
-fsched-interblock -fsched-spec
-fschedule-insns -fschedule-insns2
-fstrict-aliasing -fstrict-overflow
-ftree-switch-conversion -ftree-tail-merge
-ftree-pre
-ftree-vrp
-O3 Optimize yet more.
‘-O3’ turns on all optimizations spec-ified by ‘-O2’ and also turns on the
‘-finline-functions’,
‘-funswitch-loops’,
‘-fpredictive-commoning’,
‘-fgcse-after-reload’,
‘-ftree-loop-vectorize’, ‘-ftree-slp-vectorize’, ‘-fvect-cost-model’,
‘-ftree-partial-pre’ and ‘-fipa-cp-clone’ options.
-finline-functions
Consider all functions for inlining, even if they are not declared inline. The compiler heuristically decides which functions are worth integrating in this way. If all calls to a given function are integrated, and the function is declared static, then the function is normally not output as assembler code in its own right.