forked from Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
258 lines
9.3 KiB
258 lines
9.3 KiB
CONTROL DEPENDENCIES |
|
==================== |
|
|
|
A major difficulty with control dependencies is that current compilers |
|
do not support them. One purpose of this document is therefore to |
|
help you prevent your compiler from breaking your code. However, |
|
control dependencies also pose other challenges, which leads to the |
|
second purpose of this document, namely to help you to avoid breaking |
|
your own code, even in the absence of help from your compiler. |
|
|
|
One such challenge is that control dependencies order only later stores. |
|
Therefore, a load-load control dependency will not preserve ordering |
|
unless a read memory barrier is provided. Consider the following code: |
|
|
|
q = READ_ONCE(a); |
|
if (q) |
|
p = READ_ONCE(b); |
|
|
|
This is not guaranteed to provide any ordering because some types of CPUs |
|
are permitted to predict the result of the load from "b". This prediction |
|
can cause other CPUs to see this load as having happened before the load |
|
from "a". This means that an explicit read barrier is required, for example |
|
as follows: |
|
|
|
q = READ_ONCE(a); |
|
if (q) { |
|
smp_rmb(); |
|
p = READ_ONCE(b); |
|
} |
|
|
|
However, stores are not speculated. This means that ordering is |
|
(usually) guaranteed for load-store control dependencies, as in the |
|
following example: |
|
|
|
q = READ_ONCE(a); |
|
if (q) |
|
WRITE_ONCE(b, 1); |
|
|
|
Control dependencies can pair with each other and with other types |
|
of ordering. But please note that neither the READ_ONCE() nor the |
|
WRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might |
|
fuse the load from "a" with other loads. Without the WRITE_ONCE(), |
|
the compiler might fuse the store to "b" with other stores. Worse yet, |
|
the compiler might convert the store into a load and a check followed |
|
by a store, and this compiler-generated load would not be ordered by |
|
the control dependency. |
|
|
|
Furthermore, if the compiler is able to prove that the value of variable |
|
"a" is always non-zero, it would be well within its rights to optimize |
|
the original example by eliminating the "if" statement as follows: |
|
|
|
q = a; |
|
b = 1; /* BUG: Compiler and CPU can both reorder!!! */ |
|
|
|
So don't leave out either the READ_ONCE() or the WRITE_ONCE(). |
|
In particular, although READ_ONCE() does force the compiler to emit a |
|
load, it does *not* force the compiler to actually use the loaded value. |
|
|
|
It is tempting to try use control dependencies to enforce ordering on |
|
identical stores on both branches of the "if" statement as follows: |
|
|
|
q = READ_ONCE(a); |
|
if (q) { |
|
barrier(); |
|
WRITE_ONCE(b, 1); |
|
do_something(); |
|
} else { |
|
barrier(); |
|
WRITE_ONCE(b, 1); |
|
do_something_else(); |
|
} |
|
|
|
Unfortunately, current compilers will transform this as follows at high |
|
optimization levels: |
|
|
|
q = READ_ONCE(a); |
|
barrier(); |
|
WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */ |
|
if (q) { |
|
/* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ |
|
do_something(); |
|
} else { |
|
/* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ |
|
do_something_else(); |
|
} |
|
|
|
Now there is no conditional between the load from "a" and the store to |
|
"b", which means that the CPU is within its rights to reorder them: The |
|
conditional is absolutely required, and must be present in the final |
|
assembly code, after all of the compiler and link-time optimizations |
|
have been applied. Therefore, if you need ordering in this example, |
|
you must use explicit memory ordering, for example, smp_store_release(): |
|
|
|
q = READ_ONCE(a); |
|
if (q) { |
|
smp_store_release(&b, 1); |
|
do_something(); |
|
} else { |
|
smp_store_release(&b, 1); |
|
do_something_else(); |
|
} |
|
|
|
Without explicit memory ordering, control-dependency-based ordering is |
|
guaranteed only when the stores differ, for example: |
|
|
|
q = READ_ONCE(a); |
|
if (q) { |
|
WRITE_ONCE(b, 1); |
|
do_something(); |
|
} else { |
|
WRITE_ONCE(b, 2); |
|
do_something_else(); |
|
} |
|
|
|
The initial READ_ONCE() is still required to prevent the compiler from |
|
knowing too much about the value of "a". |
|
|
|
But please note that you need to be careful what you do with the local |
|
variable "q", otherwise the compiler might be able to guess the value |
|
and again remove the conditional branch that is absolutely required to |
|
preserve ordering. For example: |
|
|
|
q = READ_ONCE(a); |
|
if (q % MAX) { |
|
WRITE_ONCE(b, 1); |
|
do_something(); |
|
} else { |
|
WRITE_ONCE(b, 2); |
|
do_something_else(); |
|
} |
|
|
|
If MAX is compile-time defined to be 1, then the compiler knows that |
|
(q % MAX) must be equal to zero, regardless of the value of "q". |
|
The compiler is therefore within its rights to transform the above code |
|
into the following: |
|
|
|
q = READ_ONCE(a); |
|
WRITE_ONCE(b, 2); |
|
do_something_else(); |
|
|
|
Given this transformation, the CPU is not required to respect the ordering |
|
between the load from variable "a" and the store to variable "b". It is |
|
tempting to add a barrier(), but this does not help. The conditional |
|
is gone, and the barrier won't bring it back. Therefore, if you need |
|
to relying on control dependencies to produce this ordering, you should |
|
make sure that MAX is greater than one, perhaps as follows: |
|
|
|
q = READ_ONCE(a); |
|
BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ |
|
if (q % MAX) { |
|
WRITE_ONCE(b, 1); |
|
do_something(); |
|
} else { |
|
WRITE_ONCE(b, 2); |
|
do_something_else(); |
|
} |
|
|
|
Please note once again that each leg of the "if" statement absolutely |
|
must store different values to "b". As in previous examples, if the two |
|
values were identical, the compiler could pull this store outside of the |
|
"if" statement, destroying the control dependency's ordering properties. |
|
|
|
You must also be careful avoid relying too much on boolean short-circuit |
|
evaluation. Consider this example: |
|
|
|
q = READ_ONCE(a); |
|
if (q || 1 > 0) |
|
WRITE_ONCE(b, 1); |
|
|
|
Because the first condition cannot fault and the second condition is |
|
always true, the compiler can transform this example as follows, again |
|
destroying the control dependency's ordering: |
|
|
|
q = READ_ONCE(a); |
|
WRITE_ONCE(b, 1); |
|
|
|
This is yet another example showing the importance of preventing the |
|
compiler from out-guessing your code. Again, although READ_ONCE() really |
|
does force the compiler to emit code for a given load, the compiler is |
|
within its rights to discard the loaded value. |
|
|
|
In addition, control dependencies apply only to the then-clause and |
|
else-clause of the "if" statement in question. In particular, they do |
|
not necessarily order the code following the entire "if" statement: |
|
|
|
q = READ_ONCE(a); |
|
if (q) { |
|
WRITE_ONCE(b, 1); |
|
} else { |
|
WRITE_ONCE(b, 2); |
|
} |
|
WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */ |
|
|
|
It is tempting to argue that there in fact is ordering because the |
|
compiler cannot reorder volatile accesses and also cannot reorder |
|
the writes to "b" with the condition. Unfortunately for this line |
|
of reasoning, the compiler might compile the two writes to "b" as |
|
conditional-move instructions, as in this fanciful pseudo-assembly |
|
language: |
|
|
|
ld r1,a |
|
cmp r1,$0 |
|
cmov,ne r4,$1 |
|
cmov,eq r4,$2 |
|
st r4,b |
|
st $1,c |
|
|
|
The control dependencies would then extend only to the pair of cmov |
|
instructions and the store depending on them. This means that a weakly |
|
ordered CPU would have no dependency of any sort between the load from |
|
"a" and the store to "c". In short, control dependencies provide ordering |
|
only to the stores in the then-clause and else-clause of the "if" statement |
|
in question (including functions invoked by those two clauses), and not |
|
to code following that "if" statement. |
|
|
|
|
|
In summary: |
|
|
|
(*) Control dependencies can order prior loads against later stores. |
|
However, they do *not* guarantee any other sort of ordering: |
|
Not prior loads against later loads, nor prior stores against |
|
later anything. If you need these other forms of ordering, use |
|
smp_load_acquire(), smp_store_release(), or, in the case of prior |
|
stores and later loads, smp_mb(). |
|
|
|
(*) If both legs of the "if" statement contain identical stores to |
|
the same variable, then you must explicitly order those stores, |
|
either by preceding both of them with smp_mb() or by using |
|
smp_store_release(). Please note that it is *not* sufficient to use |
|
barrier() at beginning and end of each leg of the "if" statement |
|
because, as shown by the example above, optimizing compilers can |
|
destroy the control dependency while respecting the letter of the |
|
barrier() law. |
|
|
|
(*) Control dependencies require at least one run-time conditional |
|
between the prior load and the subsequent store, and this |
|
conditional must involve the prior load. If the compiler is able |
|
to optimize the conditional away, it will have also optimized |
|
away the ordering. Careful use of READ_ONCE() and WRITE_ONCE() |
|
can help to preserve the needed conditional. |
|
|
|
(*) Control dependencies require that the compiler avoid reordering the |
|
dependency into nonexistence. Careful use of READ_ONCE() or |
|
atomic{,64}_read() can help to preserve your control dependency. |
|
|
|
(*) Control dependencies apply only to the then-clause and else-clause |
|
of the "if" statement containing the control dependency, including |
|
any functions that these two clauses call. Control dependencies |
|
do *not* apply to code beyond the end of that "if" statement. |
|
|
|
(*) Control dependencies pair normally with other types of barriers. |
|
|
|
(*) Control dependencies do *not* provide multicopy atomicity. If you |
|
need all the CPUs to agree on the ordering of a given store against |
|
all other accesses, use smp_mb(). |
|
|
|
(*) Compilers do not understand control dependencies. It is therefore |
|
your job to ensure that they do not break your code.
|
|
|