Memory Model
Memory Model
1. C++ Memory Model Synchronization Modes
1.1 std::memory_order_seq_cst
This is the default mode used when none is specified, and it is the most restrictive. 1
From a practical point of view, this amounts to all atomic operations acting as optimization barriers: 1
-
Its OK to re-order things between atomic operations, but not across the operation.
-
Thread local stuff is also unaffected since there is no visibility to other threads.
1.2 std::memory_order_relaxed
This model allows for much less synchronization by removing the happens-before restrictions. 1
Without any happens-before edges:
-
no thread can count on a specific ordering from another thread.
-
The only ordering imposed is that once a value for a variable from thread 1 is observed in thread 2, thread 2 can not see an “earlier” value for that variable from thread 1. 1
There is also the presumption that relaxed stores from one thread are seen by relaxed loads in another thread within a reasonable amount of time. 1
- That means that on non-cache-coherent architectures, relaxed operations need to flush the cache (although these flushes can be merged across several relaxed operations)
The relaxed mode is most commonly used when the programmer simply wants an variable to be atomic in nature rather than using it to synchronize threads for other shared memory data. 1
1.3 std::memory_order_acquire/release/acq_rel
The third mode is a hybrid between the other two. The acquire/release mode is similar to the sequentially consistent mode, except it only applies a happens-before relationship to dependent variables. This allows for a relaxing of the synchronization required between independent reads of independent writes. 1
The interactions of non-atomic variables are still the same. Any store before an atomic operation must be seen in other threads that synchronize. 1
1.4 std::memory_order_consume
std:memory_order_consume
is a further subtle refinement in the release/acquire memory model that relaxes the requirements slightly by removing the happens before ordering on non-dependent shared variables as well. 1
2. atomic
2.1 Atomic Variable
Atomic variables are primarily used to synchronize shared memory accesses between threads. 1
2.2 Atomic Operation
A memory operation can be non-atomic even when performed by a single CPU instruction. 2
2.3 Operations on The Same Atomic Variable
Hence the memory model was designed to disallow visible reordering of operations on the same atomic variable:
- All changes to a single atomic variable appear to occur in a single total modification order, specific to that variable. This is introduced in 1.10p5, and the last non-note sentence of 1.10p10 states that loads of that variable must be consistent with this modification order. 3
3. Sequentially Consistent
Sequential consistency means that all threads agree on the order in which memory operations occurred, and that order is consistent with the order of operations in the program source code. 4
4. Sequentially Consistent Memory Model
In a sequentially consistent memory model, there is no memory reordering. It’s as if the entire program execution is reduced to a sequential interleaving of instructions from each thread. In particular, the result r1 = r2 = 0 from Memory Reordering Caught in the Act becomes impossible. 5
In any case, sequential consistency only really becomes interesting as a software memory model, when working in higher-level programming languages. In Java 5 and higher, you can declare shared variables as volatile
. In C++11, you can use the default ordering constraint, memory_order_seq_cst
, when performing operations on atomic library types. If you do those things, the toolchain will restrict compiler reordering and emit CPU-specific instructions which act as the appropriate memory barrier types. In this way, a sequentially consistent memory model can be “emulated” even on weakly-ordered multicore devices. 5
5. Sequenced Before
If a
and b
are performed by the same thread, and a
“comes first”, we say that a
is sequenced before b
. 3
C++ allows a number of different evaluation orders for each thread, notably as a result of varying argument evaluation order, and this choice may vary each time an expression is evaluated. Here we assume that each thread has already chosen its argument evaluation orders in some way, and we simply define which multi-threaded executions are consistent with this choice. Even then, there m