123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245 |
- On atomic types (atomic_t atomic64_t and atomic_long_t).
- The atomic type provides an interface to the architecture's means of atomic
- RMW operations between CPUs (atomic operations on MMIO are not supported and
- can lead to fatal traps on some platforms).
- API
- ---
- The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
- brevity):
- Non-RMW ops:
- atomic_read(), atomic_set()
- atomic_read_acquire(), atomic_set_release()
- RMW atomic operations:
- Arithmetic:
- atomic_{add,sub,inc,dec}()
- atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
- atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
- Bitwise:
- atomic_{and,or,xor,andnot}()
- atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
- Swap:
- atomic_xchg{,_relaxed,_acquire,_release}()
- atomic_cmpxchg{,_relaxed,_acquire,_release}()
- atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
- Reference count (but please see refcount_t):
- atomic_add_unless(), atomic_inc_not_zero()
- atomic_sub_and_test(), atomic_dec_and_test()
- Misc:
- atomic_inc_and_test(), atomic_add_negative()
- atomic_dec_unless_positive(), atomic_inc_unless_negative()
- Barriers:
- smp_mb__{before,after}_atomic()
- SEMANTICS
- ---------
- Non-RMW ops:
- The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
- implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
- smp_store_release() respectively.
- The one detail to this is that atomic_set{}() should be observable to the RMW
- ops. That is:
- C atomic-set
- {
- atomic_set(v, 1);
- }
- P1(atomic_t *v)
- {
- atomic_add_unless(v, 1, 0);
- }
- P2(atomic_t *v)
- {
- atomic_set(v, 0);
- }
- exists
- (v=2)
- In this case we would expect the atomic_set() from CPU1 to either happen
- before the atomic_add_unless(), in which case that latter one would no-op, or
- _after_ in which case we'd overwrite its result. In no case is "2" a valid
- outcome.
- This is typically true on 'normal' platforms, where a regular competing STORE
- will invalidate a LL/SC or fail a CMPXCHG.
- The obvious case where this is not so is when we need to implement atomic ops
- with a lock:
- CPU0 CPU1
- atomic_add_unless(v, 1, 0);
- lock();
- ret = READ_ONCE(v->counter); // == 1
- atomic_set(v, 0);
- if (ret != u) WRITE_ONCE(v->counter, 0);
- WRITE_ONCE(v->counter, ret + 1);
- unlock();
- the typical solution is to then implement atomic_set{}() with atomic_xchg().
- RMW ops:
- These come in various forms:
- - plain operations without return value: atomic_{}()
- - operations which return the modified value: atomic_{}_return()
- these are limited to the arithmetic operations because those are
- reversible. Bitops are irreversible and therefore the modified value
- is of dubious utility.
- - operations which return the original value: atomic_fetch_{}()
- - swap operations: xchg(), cmpxchg() and try_cmpxchg()
- - misc; the special purpose operations that are commonly used and would,
- given the interface, normally be implemented using (try_)cmpxchg loops but
- are time critical and can, (typically) on LL/SC architectures, be more
- efficiently implemented.
- All these operations are SMP atomic; that is, the operations (for a single
- atomic variable) can be fully ordered and no intermediate state is lost or
- visible.
- ORDERING (go read memory-barriers.txt first)
- --------
- The rule of thumb:
- - non-RMW operations are unordered;
- - RMW operations that have no return value are unordered;
- - RMW operations that have a return value are fully ordered;
- - RMW operations that are conditional are unordered on FAILURE,
- otherwise the above rules apply.
- Except of course when an operation has an explicit ordering like:
- {}_relaxed: unordered
- {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
- {}_release: the W of the RMW (or atomic_set) is a RELEASE
- Where 'unordered' is against other memory locations. Address dependencies are
- not defeated.
- Fully ordered primitives are ordered against everything prior and everything
- subsequent. Therefore a fully ordered primitive is like having an smp_mb()
- before and an smp_mb() after the primitive.
- The barriers:
- smp_mb__{before,after}_atomic()
- only apply to the RMW ops and can be used to augment/upgrade the ordering
- inherent to the used atomic op. These barriers provide a full smp_mb().
- These helper barriers exist because architectures have varying implicit
- ordering on their SMP atomic primitives. For example our TSO architectures
- provide full ordered atomics and these barriers are no-ops.
- NOTE: when the atomic RmW ops are fully ordered, they should also imply a
- compiler barrier.
- Thus:
- atomic_fetch_add();
- is equivalent to:
- smp_mb__before_atomic();
- atomic_fetch_add_relaxed();
- smp_mb__after_atomic();
- However the atomic_fetch_add() might be implemented more efficiently.
- Further, while something like:
- smp_mb__before_atomic();
- atomic_dec(&X);
- is a 'typical' RELEASE pattern, the barrier is strictly stronger than
- a RELEASE. Similarly for something like:
- atomic_inc(&X);
- smp_mb__after_atomic();
- is an ACQUIRE pattern (though very much not typical), but again the barrier is
- strictly stronger than ACQUIRE. As illustrated:
- C strong-acquire
- {
- }
- P1(int *x, atomic_t *y)
- {
- r0 = READ_ONCE(*x);
- smp_rmb();
- r1 = atomic_read(y);
- }
- P2(int *x, atomic_t *y)
- {
- atomic_inc(y);
- smp_mb__after_atomic();
- WRITE_ONCE(*x, 1);
- }
- exists
- (r0=1 /\ r1=0)
- This should not happen; but a hypothetical atomic_inc_acquire() --
- (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
- since then:
- P1 P2
- t = LL.acq *y (0)
- t++;
- *x = 1;
- r0 = *x (1)
- RMB
- r1 = *y (0)
- SC *y, t;
- is allowed.
|