终于弄明白了Linux内核的LOCK_PREFIX的含义

最新推荐文章于 2025-05-21 16:18:52 发布

geometriclife

最新推荐文章于 2025-05-21 16:18:52 发布

阅读量1.8k

点赞数 1

分类专栏： Linux kernel 文章标签： linux kernel

Linux kernel 专栏收录该内容

6 篇文章

订阅专栏

本文深入探讨了x86架构中SMP配置下原子整数操作的实现细节，特别是LOCK_PREFIX宏的使用及如何针对单处理器环境进行优化。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

x86 架构的内核实现原子整数的时候，用到了 LOCK_PREFIX 这个宏

static __inline__ void atomic_add(int i, atomic_t *v)

{

__asm__ __volatile__(

LOCK_PREFIX "addl %1,%0"

:"+m" (v->counter)

:"ir" (i));

}

在 CONFIG_SMP 的时候：

#define LOCK_PREFIX \

".section .smp_locks,\"a\"\n" \

" .align 4\n" \

" .long 661f\n" /* address */ \

".previous\n" \

"661:\n\tlock; "

展开后变成这样：

.section .smp_locks,"a"

.align 4

.long 661f

.previous

661:

lock;

本来觉得直接加上 lock 前缀即可，前面一堆干吗用的一直不明白，终于决定要搞懂，

翻开 as 手册，查了个明白，现逐条解释如下：

.section .smp_locks,"a"

下面的代码生成到 .smp_locks 段里，属性为"a", allocatable，参考 as 7.76 .section

name

.align 4

四字节对齐

.long 661f

生成一个整数，值为下面的 661 标号的实际地址，f 表示向前引用，如果 661 标号出现

在前面，要写 661b。

.previous

代码生成恢复到原来的段，也就是 .text

661:

数字标号是局部标号，5.3 Symbol Names

lock;

开始生成指令，lock 前缀

这段代码汇编后，在 .text 段生成一条 lock 指令前缀 0xf0，在 .smp_locks 段生成

四个字节的 lock 前缀的地址，链接的时候，所有的 .smp_locks 段合并起来，形成一个

所有 lock 指令地址的数组，这样统计 .smp_locks 段就能知道代码里有多少个加锁的

指令被生成，猜测是为了调试目的。

----

搜索完成，果然找到了引用处：

linux-2.6.23.12\arch\i386\kernel\module.c

当一个内核模块被加载后调用这个

int module_finalize(const Elf_Ehdr *hdr,

const Elf_Shdr *sechdrs,

struct module *me)

{

const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,

*para = NULL;

char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;

for (s = sechdrs; s e_shnum; s++) {

if (!strcmp(".text", secstrings + s->sh_name))

text = s;

if (!strcmp(".altinstructions", secstrings + s->sh_name))

alt = s;

if (!strcmp(".smp_locks", secstrings + s->sh_name))

locks= s;

if (!strcmp(".parainstructions", secstrings + s->sh_name))

para = s;

}

if (alt) {

/* patch .altinstructions */

void *aseg = (void *)alt->sh_addr;

apply_alternatives(aseg, aseg + alt->sh_size);

}

if (locks && text) {

void *lseg = (void *)locks->sh_addr;

void *tseg = (void *)text->sh_addr;

alternatives_smp_module_add(me, me->name,

lseg, lseg + locks->sh_size,

tseg, tseg + text->sh_size);

}

if (para) {

void *pseg = (void *)para->sh_addr;

apply_paravirt(pseg, pseg + para->sh_size);

}

return module_bug_finalize(hdr, sechdrs, me);

}

上面的代码说，如果模块有 .text 和 .smp_locks 段，就调这个来处理，做什么呢？

void alternatives_smp_module_add(struct module *mod, char *name,

void *locks, void *locks_end,

void *text, void *text_end)

{

struct smp_alt_module *smp;

unsigned long flags;

if (noreplace_smp)

return;

if (smp_alt_once) {

if (boot_cpu_has(X86_FEATURE_UP))

alternatives_smp_unlock(locks, locks_end,

text, text_end);

return;

}

。。。

}

上面的代码说，如果是单处理器(UP)，就调这个：

static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8

*text_end)

{

u8 **ptr;

char insn[1];

if (noreplace_smp)

return;

add_nops(insn, 1);

for (ptr = start; ptr text_end)

continue;

text_poke(*ptr, insn, 1);

};

}

看到这里就能明白，这是内核配置了 smp，但是实际运行到单处理器上时，通过运行期间

打补丁，根据 .smp_locks 里的记录，把 lock 指令前缀替换成 nop 以消除指令加锁的

开销，这个优化真是极致了……，可能考虑很多用户直接使用的是配置支持 SMP 编译好

的内核而特地对 x86/x64 做的这个优化。

看来学点汇编还是有用的。

顺手搜了一下：

http://www.google.cn/search?hl=zh-CN&newwindow=1&q=alternative+kernel+smp&meta=&

aq=f&oq=

又看到这段注释……，本来仔细看一下就明白了，折腾啊。

* Alternative inline assembly for SMP.

* The LOCK_PREFIX macro defined here replaces the LOCK and

* LOCK_PREFIX macros used everywhere in the source tree.

* SMP alternatives use the same data structures as the other

* alternatives and the X86_FEATURE_UP flag to indicate the case of a

* UP system running a SMP kernel. The existing apply_alternatives()

* works fine for patching a SMP kernel for UP.

* The SMP alternative tables can be kept after boot and contain both

* UP and SMP versions of the instructions to allow switching back to

* SMP at runtime, when hotplugging in a new CPU, which is especially

* useful in virtualized environments.

* The very common lock prefix is handled as special case in a

* separate table which is a pure address list without replacement ptr

* and size information. That keeps the table sizes small.