mips指令优化：inline不起效果

最新推荐文章于 2025-04-30 21:04:00 发布

海棠花败

最新推荐文章于 2025-04-30 21:04:00 发布

阅读量543

点赞数

分类专栏： MIPS/ARM体系结构/汇编

本文链接：https://blog.youkuaiyun.com/weixin_38669561/article/details/104926707

版权

MIPS/ARM体系结构/汇编专栏收录该内容

16 篇文章

订阅专栏

程序实例（lock.h）：

static __inline__ int
tas(volatile slock_t *lock)
{
	register volatile slock_t *_l = lock;
	register int _res;
	register int _tmp;

	__asm__ __volatile__(
		"       .set push           \n"
		"       .set mips2          \n"
		"       .set noreorder      \n"
		"       .set nomacro        \n"
		"       ll      %0, %2      \n"
		"       or      %1, %0, 1   \n"
		"       sc      %1, %2      \n"
		"       xori    %1, 1       \n"
		"       or      %0, %0, %1  \n"
		"       sync                \n"
		"       .set pop              "
:		"=&r" (_res), "=&r" (_tmp), "+R" (*_l)
:		/* no inputs */
:		"memory");
	return _res;
}

代码本身来看已经没有优化空间。MIPS架构使用ll/sc指令完成原子操作。

但是gcc默认优化等级是1，导致使用上面函数的__inline__声明没有起作用，这就导致会多出更多的指令。可以在编译时添加“-O3”来优化代码，使__inlinie__起效果来减少运行时的指令数。但这个tas如果调用频繁，“-O3”可能会膨胀你的代码量。代码量的膨胀必然导致程序运行过程中cache命中率。

这需要我们要全盘评估整个软件项目里使用tas的频率和代码量，来决定是否使用“-O3”参数。

附：gcc手册中对-O3的定义：

-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to ‘-O’, this option increases both compilation time and the performance of the generated code.
‘-O2’ turns on all optimization flags specified by ‘-O’. It also turns on the
following optimization flags:
-fthread-jumps
-falign-functions -falign-jumps
-falign-loops -falign-labels
-fcaller-saves
-fcrossjumping
-fcse-follow-jumps -fcse-skip-blocks
-fdelete-null-pointer-checks
-fdevirtualize -fdevirtualize-speculatively
-fexpensive-optimizations
-fgcse -fgcse-lm
-fhoist-adjacent-loads
-finline-small-functions
-findirect-inlining
-fipa-sra
-fisolate-erroneous-paths-dereference
-foptimize-sibling-calls
-fpartial-inlining
-fpeephole2
-freorder-blocks -freorder-functions
-frerun-cse-after-loop
-fsched-interblock -fsched-spec
-fschedule-insns -fschedule-insns2
-fstrict-aliasing -fstrict-overflow
-ftree-switch-conversion -ftree-tail-merge
-ftree-pre
-ftree-vrp

-O3 Optimize yet more.

‘-O3’ turns on all optimizations spec-ified by ‘-O2’ and also turns on the
‘-finline-functions’,
‘-funswitch-loops’,
‘-fpredictive-commoning’,
‘-fgcse-after-reload’,
‘-ftree-loop-vectorize’, ‘-ftree-slp-vectorize’, ‘-fvect-cost-model’,
‘-ftree-partial-pre’ and ‘-fipa-cp-clone’ options.

-finline-functions
Consider all functions for inlining, even if they are not declared inline. The compiler heuristically decides which functions are worth integrating in this way. If all calls to a given function are integrated, and the function is declared static, then the function is normally not output as assembler code in its own right.