编程参考 - 理解Atomics和memory ordering

原子操作与内存排序
Understanding Atomics and Memory Ordering
原子和内存排序总让人觉得是一个难以接近的话题。在众多拙劣的解释中,我希望通过描述我是如何推理这些乱七八糟的东西的,来为自己的解释再添一笔。这只是我的理解,所以如果你需要更好/更正式的解释,我建议你阅读一下你所使用编程语言的内存模型。在这种情况下,可以参考 cppreference.com 上描述的 C11 内存模型。
Atomics and Memory Ordering always feel like an unapproachable topic. In the sea of poor explanations, I wish to add another by describing how I reason about all of this mess. This is only my understanding so if you need a better/formal explanation, I recommend reading through the memory model for your given programming language. In this case, it would be the C11 Memory Model described at cppreference.com.
Shared Memory
在单线程执行代码方面,软件和硬件越来越接近性能的极限。为了继续提升计算性能,一种流行的解决方案是引入多个单线程执行单元,即多线程。这种计算形式体现在不同的抽象层级,从 CPU 中的多个内核到一台机器中的多个 CPU,甚至是跨网络的多台机器。本篇文章将更多地关注 CPU 中的内核,将其称为 "线程"。
Software and hardware is getting closer to the limits of performance when it comes to single-threaded execution of code. In order to continue scaling compute performance, a popular solution is to introduce multiple single-threaded execution units - or multi-threading. This form of computation manifests itself at different abstraction levels from multiple cores in a CPU to multiple CPUs in a machine and even multiple machines across a network. This post will be focusing more on cores in a CPU, referring to them as “threads”.
对于某些工作负载,任务可以清晰地划分,并分给线程执行。这类任务被称为 "理想并行",不需要相互通信。这是多线程算法应该努力实现的理想状态,因为它可以利用单线程执行的所有现有优化功能。但这并不总是可行的,有时任务之间需要相互通信和协调,这就是我们需要在线程之间共享内存的原因。
For some workloads, the tasks can be divided cleanly and split off to the threads for execution. Such tasks are known as embarrassingly parallel and need not communicate with each other. This is the ideal that multithreaded algorithms should strive for since it takes advantage of all the existing optimizations available for single-threaded execution. However this isn't always possible and it's sometimes necessary for tasks to communicate and coordinate with each other, which is why we need to share memory between threads.
当你的代码在抢占式调度环境中运行时,通信是很困难的。这种环境意味着,在任何时候,你的代码都可能被中断,以便运行其他代码。在应用程序中,操作系统内核可以决定从运行你的程序切换到运行另一个程序。在内核中,硬件可以从运行内核代码切换到运行中断处理程序代码。像这样的任务切换被称为并发,为了同步/通信,我们需要一种方法在一小段时间内排除并发,否则我们就有可能在不完整/部分数据的情况下运行。
Communication is hard when your code is running in a preemptive scheduling setting. Such an environment means that, at any point, your code can be interrupted in order for other code to run. In applications, the operating system kernel can decide to switch from running your program to run another. In the kernel, hardware can switch from running kernel code to running interrupt handler code. Switching tasks around like this is known as concurrency and in order to synchronize/communicate, we need a way to exclude that concurrency for a small time frame or we risk operating with incomplete/partial data.
Atomics
幸运的是,CPU 为软件提供了在共享内存上进行操作的特殊指令,这些指令不会被中断。这些指令被称为原子内存操作,可分为三类: 加载、存储和读取修改写入(RMW)。前两种操作不言自明。RMW 也很好描述:它允许你从内存中加载数据,对数据进行操作,并将结果存储回内存--所有这些都是原子操作。你可以认为 RMW 操作是原子增量、交换或比较和交换。
Fortunately, CPUs supply software with special instructions to operate on shared memory which can't be interrupted. These are known as atomic memory operations and fit into three categories: Loads, Stores, and ReadModifyWrites (RMW). The first two are self explanatory. RMW is also pretty descriptive: it allows you to load data from memory, operate on the data, and store the result back into memory - all atomically. You may know RMW operations as atomic increment, swap, or compare and swap.
"原子式 "地做某件事意味着它必须完整地发生(或被观察到发生),或者根本不发生。这意味着它不能被中断。当某些操作是 "原子 "操作时,就无法观察到操作的撕裂(即部分完成)。原子操作允许我们编写代码,以安全的方式使用共享内存,防止并发中断。
To do something "atomically" means that it must happen (or be observed to happen) in its entirety or not at all. This implies that it cannot be interrupted. When something is "atomic", tearing (i.e. partial completion) of the operation cannot be observed. Atomic operations allow us to write code that can work with shared memory in a way that's safe against concurrent interruption.
关于 atomics 的另一件事是,当共享内存至少有一个写入器,可能还有多个读/写入器时,atomics 是与共享内存交互的唯一合理(即正确定义)方式。如果不使用原子操作,就会出现数据竞争,即未定义行为(UB)。未定义行为是指依赖于目标程序模型(在我们的例子中是 C11 内存模型)之外的假设的行为。这样做是不可靠的,因为编译器或CPU在其模型定义之外什么行为都有可能发生。
Another thing about atomics is that they're the only sound (i.e. correctly defined) way to interact with shared memory when there's at least one writer and possibly multiple readers/writers to the shared memory. Trying to do so without atomics is considered a data race which is undefined behavior (UB). UB is the act of relying on an assumption outside of your target program model (in our case, the C11 memory model). Doing so is unreliable as the compiler or cpu is allowed to do anything outside of its model.
数据竞争及其隐含的未定义行为不仅仅是一个理论问题。我之前提到的单线程优化之一就涉及 CPU 或编译器对内存读写的缓存。如果不使用原子操作,操作本身就会被省略,取而代之的是缓存结果,这就很容易破坏代码的逻辑:
Data races and the UB it implies isn’t just a theoretical issue. One of the single-threaded optimizations I mentioned earlier involves either the CPU or the compiler caching memory reads and writes. If you don’t use atomic operations, the operation itself could be ellided and replaced with its cached result which could break the logic of your code fairly easily:
# should be an atomic_load() but its data race
while (not load(bool)):
    continue
# a potential single-threaded optimization
cached = load(bool)
while (not cached): # possibly infinite loop!
    continue
Reordering
原子操作只能解决以原子方式访问内存的通信问题,但并非所有通信内存都能以原子方式访问。CPU 通常会对最多只有几个字节大的内存进行原子操作。如果要进行任何其他类型的通用内存通信,就意味着我们需要一种方法,让线程可以通过其他方式使用这些内存。
Atomics solve communication only on atomically accessed memory; but not all memory being communicated can be accessed atomically. CPUs generally expose atomic operations for memory that's at most a few bytes large. Trying to do any other sort of general purpose memory communication means we need a way to make this memory available to threads with other means.
向其他线程提供内存实际上比听起来更棘手。让我们看看这个代码示例:
Making memory available to other threads is actually trickier than it sounds. Le
<think>好的,我现在需要帮助用户理解GCC编译器选项`-mno-outline-atomics`的作用使用场景。首先,我应该回顾用户提供的引用内容,特别是引用[1]、[3][4],因为这些地方提到了这个选项。根据引用[1],GCC9.4开始提供`-moutline-atomics``-mno-outline-atomics`选项。这说明这两个选项是相对较新加入的,可能ARM架构有关,因为用户提到的是AArch64的标志。引用[3]中提到,在编译时遇到原子指令的问题,用户尝试使用`-mno-outline-atomics`来避免生成基于LSE(Large SystemExtensions)的原子操作,这可能ARMv8.1的指令集支持有关。用户还提到,在支持ARMv8.1的系统上,编译器可能仍然报错,可能需要显式指定选项来控制原子指令的生成。接下来,我需要明确`-mno-outline-atomics`的具体作用。根据引用[3],LSE是ARMv8.1引入的原子指令,用于提高多核环境下的锁操作性能。而`-moutline-atomics``-mno-outline-atomics`控制是否使用外联(outline)的方式处理原子操作。当启用`-moutline-atomics`时,编译器会生成调用运行时库的代码来处理原子操作,而不是内联指令。这可能有助于兼容不支持LSE的旧处理器,通过运行时检测选择正确的实现。而`-mno-outline-atomics`则禁用这种外联方式,强制使用内联的原子指令,可能在特定环境下避免兼容性问题,但需要硬件支持。使用场景方面,如果目标平台不支持LSE或者需要兼容旧版ARM架构(如ARMv8.0),使用`-mno-outline-atomics`可能避免生成依赖LSE的指令,防止运行时错误。反之,如果平台支持LSE,使用默认或启用外联可能更优,以利用更高效的原子操作。另外,引用[5]中提到clang报错`-mno-fused-madd`,这可能与编译器选项的兼容性有关,但当前问题主要关注GCC的ARM选项,所以暂时不需要深入这个部分。 总结:`-mno-outline-atomics`的作用是禁用外联原子操作,强制使用内联指令,适用于需要避免依赖运行时库检测或处理兼容性问题的场景,特别是当目标环境不支持LSE或需要确保原子操作直接使用特定指令时。</think>### GCC编译器选项`-mno-outline-atomics`的作用及使用场景 #### 作用解析 `-mno-outline-atomics`是GCC针对ARMv8-A架构(AArch64)的编译选项,主要用于**禁用原子操作的外联(outline)实现**,强制编译器使用内联的原子指令[^1]。 - **背景**:ARMv8.1引入了LSE(Large System Extensions)指令集,包含高效的原子操作指令(如`CAS`、`SWP`),用于优化多核环境下的锁操作[^3]。 - **默认行为**:GCC默认启用`-moutline-atomics`,即编译器会生成**运行时检测代码**,根据目标平台是否支持LSE,动态选择传统LL/SC(Load-Link/Store-Conditional)指令或LSE指令[^1][^3]。 - **禁用效果**:使用`-mno-outline-atomics`后,编译器将**直接内联原子指令**,跳过运行时检测逻辑。若目标平台不支持LSE,可能导致程序崩溃或兼容性问题[^3]。 #### 使用场景 1. **强制使用LSE指令** 当目标平台明确支持ARMv8.1及以上指令集时,禁用外联可避免运行时检测开销,直接生成高效的LSE指令[^3]。 **示例命令**: ```bash gcc -march=armv8.1-a -mno-outline-atomics ... ``` 2. **兼容性问题调试** 若编译时出现`undefined reference to __aarch64_ldadd4_relax`等错误,可能因运行时库未正确支持外联原子操作。禁用外联可绕过此问题[^1][^3]。 3. **交叉编译环境** 在交叉编译时,若目标系统不支持LSE或运行时库未正确配置,需强制使用内联原子指令保证兼容性[^2][^3]。 #### 对比选项 | 选项 | 行为 | |-----------------------|----------------------------------------------------------------------| | `-moutline-atomics` | 生成运行时检测代码,动态选择LL/SC或LSE指令(默认)[^1] | | `-mno-outline-atomics`| 直接内联原子指令,需手动确保目标平台支持所选指令集[^3] | #### 示例场景 **问题**:在支持ARMv8.1的服务器上编译多线程程序时,因外联原子操作与运行时库冲突导致链接失败。 **解决**:添加`-mno-outline-atomics`强制使用内联LSE指令。 ```bash gcc -fopenmp -mno-outline-atomics -o program source.c ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

夜流冰

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值