Understanding Atomics and Memory Ordering
原子和内存排序总让人觉得是一个难以接近的话题。在众多拙劣的解释中,我希望通过描述我是如何推理这些乱七八糟的东西的,来为自己的解释再添一笔。这只是我的理解,所以如果你需要更好/更正式的解释,我建议你阅读一下你所使用编程语言的内存模型。在这种情况下,可以参考
cppreference.com 上描述的 C11 内存模型。
|
Atomics and Memory Ordering always feel like an unapproachable topic. In the sea of poor explanations, I wish to add another by describing how I reason about all of this mess. This is only my understanding so if you need a better/formal explanation, I recommend reading through the memory model for your given programming language. In this case, it would be the C11 Memory Model described at cppreference.com.
|
Shared Memory
在单线程执行代码方面,软件和硬件越来越接近性能的极限。为了继续提升计算性能,一种流行的解决方案是引入多个单线程执行单元,即多线程。这种计算形式体现在不同的抽象层级,从 CPU 中的多个内核到一台机器中的多个 CPU,甚至是跨网络的多台机器。本篇文章将更多地关注 CPU 中的内核,将其称为 "线程"。
|
Software and hardware is getting closer to the limits of performance when it comes to single-threaded execution of code. In order to continue scaling compute performance, a popular solution is to introduce multiple single-threaded execution units - or multi-threading. This form of computation manifests itself at different abstraction levels from multiple cores in a CPU to multiple CPUs in a machine and even multiple machines across a network. This post will be focusing more on cores in a CPU, referring to them as “threads”.
|
对于某些工作负载,任务可以清晰地划分,并分给线程执行。这类任务被称为 "理想并行",不需要相互通信。这是多线程算法应该努力实现的理想状态,因为它可以利用单线程执行的所有现有优化功能。但这并不总是可行的,有时任务之间需要相互通信和协调,这就是我们需要在线程之间共享内存的原因。
|
For some workloads, the tasks can be divided cleanly and split off to the threads for execution. Such tasks are known as
embarrassingly parallel and need not communicate with each other. This is the ideal that multithreaded algorithms should strive for since it takes advantage of all the existing optimizations available for single-threaded execution. However this isn't always possible and it's sometimes necessary for tasks to communicate and coordinate with each other, which is why we need to share memory between threads.
|
当你的代码在抢占式调度环境中运行时,通信是很困难的。这种环境意味着,在任何时候,你的代码都可能被中断,以便运行其他代码。在应用程序中,操作系统内核可以决定从运行你的程序切换到运行另一个程序。在内核中,硬件可以从运行内核代码切换到运行中断处理程序代码。像这样的任务切换被称为并发,为了同步/通信,我们需要一种方法在一小段时间内排除并发,否则我们就有可能在不完整/部分数据的情况下运行。
|
Communication is hard when your code is running in a preemptive scheduling setting. Such an environment means that, at any point, your code can be interrupted in order for other code to run. In applications, the operating system kernel can decide to switch from running your program to run another. In the kernel, hardware can switch from running kernel code to running interrupt handler code. Switching tasks around like this is known as concurrency and in order to synchronize/communicate, we need a way to exclude that concurrency for a small time frame or we risk operating with incomplete/partial data.
|
Atomics
幸运的是,CPU 为软件提供了在共享内存上进行操作的特殊指令,这些指令不会被中断。这些指令被称为原子内存操作,可分为三类: 加载、存储和读取修改写入(RMW)。前两种操作不言自明。RMW 也很好描述:它允许你从内存中加载数据,对数据进行操作,并将结果存储回内存--所有这些都是原子操作。你可以认为 RMW 操作是原子增量、交换或比较和交换。
|
Fortunately, CPUs supply software with special instructions to operate on shared memory which can't be interrupted. These are known as atomic memory operations and fit into three categories: Loads, Stores, and ReadModifyWrites (RMW). The first two are self explanatory. RMW is also pretty descriptive: it allows you to load data from memory, operate on the data, and store the result back into memory - all atomically. You may know RMW operations as atomic
increment,
swap, or
compare and swap.
|
"原子式 "地做某件事意味着它必须完整地发生(或被观察到发生),或者根本不发生。这意味着它不能被中断。当某些操作是 "原子 "操作时,就无法观察到操作的撕裂(即部分完成)。原子操作允许我们编写代码,以安全的方式使用共享内存,防止并发中断。
|
To do something "atomically" means that it must happen (or be observed to happen) in its entirety or not at all. This implies that it cannot be interrupted. When something is "atomic", tearing (i.e. partial completion) of the operation cannot be observed. Atomic operations allow us to write code that can work with shared memory in a way that's safe against concurrent interruption.
|
关于 atomics 的另一件事是,当共享内存至少有一个写入器,可能还有多个读/写入器时,atomics 是与共享内存交互的唯一合理(即正确定义)方式。如果不使用原子操作,就会出现数据竞争,即未定义行为(UB)。未定义行为是指依赖于目标程序模型(在我们的例子中是 C11 内存模型)之外的假设的行为。这样做是不可靠的,因为编译器或CPU在其模型定义之外什么行为都有可能发生。
|
Another thing about atomics is that they're the only sound (i.e. correctly defined) way to interact with shared memory when there's at least one writer and possibly multiple readers/writers to the shared memory. Trying to do so without atomics is considered a
data race which is
undefined behavior (UB). UB is the act of relying on an assumption outside of your target program model (in our case, the C11 memory model). Doing so is unreliable as the compiler or cpu is allowed to do anything outside of its model.
|
数据竞争及其隐含的未定义行为不仅仅是一个理论问题。我之前提到的单线程优化之一就涉及 CPU 或编译器对内存读写的缓存。如果不使用原子操作,操作本身就会被省略,取而代之的是缓存结果,这就很容易破坏代码的逻辑:
|
Data races and the UB it implies isn’t just a theoretical issue. One of the single-threaded optimizations I mentioned earlier involves either the CPU or the compiler caching memory reads and writes. If you don’t use atomic operations, the operation itself could be ellided and replaced with its cached result which could break the logic of your code fairly easily:
|
# should be an atomic_load() but its data race
while (not load(bool)):
continue
# a potential single-threaded optimization
cached = load(bool)
while (not cached): # possibly infinite loop!
continue
Reordering
原子操作只能解决以原子方式访问内存的通信问题,但并非所有通信内存都能以原子方式访问。CPU 通常会对最多只有几个字节大的内存进行原子操作。如果要进行任何其他类型的通用内存通信,就意味着我们需要一种方法,让线程可以通过其他方式使用这些内存。
|
Atomics solve communication only on atomically accessed memory; but not all memory being communicated can be accessed atomically. CPUs generally expose atomic operations for memory that's at most a few bytes large. Trying to do any other sort of general purpose memory communication means we need a way to make this memory available to threads with other means.
|
向其他线程提供内存实际上比听起来更棘手。让我们看看这个代码示例:
|
Making memory available to other threads is actually trickier than it sounds. Le
|
原子操作与内存排序

最低0.47元/天 解锁文章
898

被折叠的 条评论
为什么被折叠?



