[pypy-commit] extradoc extradoc: merge
fijal
noreply at buildbot.pypy.org
Tue Sep 4 11:11:37 CEST 2012
Author: Maciej Fijalkowski <fijall at gmail.com>
Branch: extradoc
Changeset: r4737:ebef81cd119a
Date: 2012-09-04 11:11 +0200
http://bitbucket.org/pypy/extradoc/changeset/ebef81cd119a/
Log: merge
diff --git a/talk/stm2012/stmimpl.rst b/talk/stm2012/stmimpl.rst
--- a/talk/stm2012/stmimpl.rst
+++ b/talk/stm2012/stmimpl.rst
@@ -2,8 +2,9 @@
STM implementation model
========================
+
Overview
---------
+============================================================
Objects are either global (visible to everybody, and read-only), or
they are local (visible only to the current thread).
@@ -47,6 +48,11 @@
compare-and-exchange operation.
+
+The STM implementation
+============================================================
+
+
Object header
-------------
@@ -55,6 +61,7 @@
- h_global (boolean)
- h_possibly_outdated (boolean)
- h_written (boolean)
+- h_local_copy (boolean)
- h_revision (unsigned integer)
The h_revision is an unsigned "revision number" that can also
@@ -70,9 +77,14 @@
version in any transaction.
- ``h_written`` is set on local objects that have been written to.
+ It is false on global objects.
+
+- ``h_local_copy`` is set on local objects that are a copy of a global
+ object. It is undefined on global objects.
- ``h_revision`` on local objects points to the global object that they
- come from, if any; otherwise it is 1.
+ come from, if any (i.e. if ``h_local_copy``); otherwise it is
+ undefined (and can be used for other purposes, e.g. by the GC).
- ``h_revision`` on global objects depends on whether the object is the
head of the chained list of revisions or not. If it is, then
@@ -89,6 +101,7 @@
the transaction it has so far. The following data is transaction-local:
- start_time
+- is_inevitable
- global_to_local
- list_of_read_objects
- recent_reads_cache
@@ -99,6 +112,8 @@
the state at time ``start_time``. The global "time" is a single global
number that is atomically incremented whenever a transaction commits.
+``is_inevitable`` is a flag described later.
+
``global_to_local`` is a dictionary-like mapping of global objects to
their corresponding local objects.
@@ -157,7 +172,7 @@
- All barriers ensure that ``global_to_local`` satisfies the following
property for any local object ``L``: either ``L`` was created by
- this transaction (``L->h_revision == 1``) or else satisfies
+ this transaction (``L->h_revision`` is undefined) or else satisfies
``global_to_local[L->h_revision] == L``.
@@ -171,7 +186,8 @@
W->h_global = False
W->h_possibly_outdated = False
W->h_written = True
- W->h_revision = 0
+ W->h_local_copy = False
+ #W->h_revision can be left uninitialized
return W
@@ -186,8 +202,10 @@
while not (v := R->h_revision) & 1:# "is a pointer", i.e.
R = v # "has a more recent revision"
if v > start_time: # object too recent?
+ if V >= LOCKED: # object actually locked?
+ goto retry # spin-loop to start of func
ValidateDuringTransaction() # try to move start_time forward
- return LatestGlobalRevision(R) # restart searching from R
+ goto retry # restart searching from R
PossiblyUpdateChain(G, R, ...) # see below
return R
@@ -231,20 +249,29 @@
``L = Localize(R)`` is an operation that takes a read-ready pointer to a
-global object and returns a corresponding pointer to a local object::
+*global* object and returns a corresponding pointer to a local object::
def Localize(R):
+ assert R->h_global
if R in global_to_local:
return global_to_local[R]
L = malloc(sizeof R)
L->h_global = False
L->h_possibly_outdated = False
L->h_written = False
+ L->h_local_copy = True
L->h_revision = R # back-reference to the original
L->objectbody... = R->objectbody...
global_to_local[R] = L
return L
+ def LocalizeReadReady(R):
+ if R->h_global:
+ L = Localize(R)
+ else:
+ L = R
+ return L
+
``W = WriteBarrier(P)`` and ``W = WriteBarrierFromReadReady(R)`` are
two versions of the write barrier::
@@ -352,7 +379,7 @@
to the latest version::
def PossiblyUpdateChain(G, R, R_Container, FieldName):
- if R != G:
+ if R != G and Rarely():
# compress the chain
while G->h_revision != R:
G_next = G->h_revision
@@ -369,45 +396,58 @@
the modified values. It works because the original and each modified
value are all interchangeable as far as correctness goes.
+``Rarely`` uses a thread-local counter to return True only rarely. We
+do the above update only rarely, rather than always, although it would
+naively seem that doing the update always is a good idea. The problem
+is that it generates a lot of write traffic to global data that is
+potentially shared between CPUs. We will need more measurements, but it
+seems that doing it too often causes CPUs to stall. It is probable that
+updates done by one CPU are sent to other CPUs at high cost, even though
+these updates are not so important in this particular case (i.e. the
+program would work fine if the other CPUs didn't see such updates at all
+and instead repeated the same update logic locally).
+
Validation
------------------------------------
``ValidateDuringTransaction`` is called during a transaction to update
``start_time``. It makes sure that none of the read objects have been
-modified since ``start_time``::
+modified since ``start_time``. If one of these objects is modified by
+another commit in parallel, then we want this transaction to eventually
+fail. More precisely, it will fail the next time one of the
+``ValidateDuring*`` functions is called.
+
+Note a subtle point: if an object is currently locked, we have to wait
+until it gets unlocked, because it might turn out to point to a more
+recent version that is still older than the current global time.
+
+Here is ``ValidateDuringTransaction``::
def ValidateDuringTransaction():
- start_time = global_cur_time # copy from the global time
+ start_time = GetGlobalCurTime() # copy from the global time
for R in list_of_read_objects:
- if not (R->h_revision & 1): # "is a pointer", i.e.
+ v = R->h_revision
+ if not (v & 1): # "is a pointer", i.e.
AbortTransaction() # "has a more recent revision"
-
-If such an object is modified by another commit, then this transaction
-will eventually fail --- hopefully, the next time
-``ValidateDuringTransaction`` is called.
+ if v >= LOCKED: # locked
+ spin loop retry # jump back to the "v = ..." line
The last detection for inconsistency is during commit, when
``ValidateDuringCommit`` is called. It is a slightly more complex
version than ``ValidateDuringTransaction`` because it has to handle
-"locks" correctly::
+"locks" correctly. It also returns a True/False result instead of
+aborting::
def ValidateDuringCommit():
for R in list_of_read_objects:
v = R->h_revision
if not (v & 1): # "is a pointer", i.e.
- AbortTransaction() # "has a more recent revision"
+ return False # "has a more recent revision"
if v >= LOCKED: # locked
if v != my_lock: # and not by me
- spin loop retry OR # jump back to the "v = ..." line
- AbortTransaction() # ...or just abort
-
-The choice of waiting or aborting when encountering a read of a locked
-object needs to be done carefully to avoid deadlocks. Always aborting
-would be correct, but a bit too restrictive. Always entering a spin
-loop could lead to deadlocks with two transactions that each locked
-objects from the other's ``list_of_read_objects``. So for the purposes
-of this explanation we will always assume that it aborts.
+ return False
+ return True
Local garbage collection
@@ -447,10 +487,10 @@
Hand-wavy pseudo-code::
- def TransactionEnd():
+ def FinishTransaction():
FindRootsForLocalCollect()
PerformLocalCollect()
- TransactionCommit() # see below
+ CommitTransaction() # see below
def FindRootsForLocalCollect():
for (R, L) in global_to_local:
@@ -465,7 +505,18 @@
collect from the roots...
for all reached local object,
change h_global False->True
- and h_written True->False
+ change h_written True->False
+ if not h_local_copy:
+ h_revision = 1
+
+Note that non-written local objects are just shadow copies of existing
+global objects. For the sequel we just replace them with the original
+global objects again. This is done by tweaking the local objects'
+header.
+
+Note also that ``h_revision`` is free to be (ab)used on newly allocated
+objects (the GC of PyPy does this), but it should be set to 1 just
+before calling ``CommitTransaction``.
Committing
@@ -494,11 +545,14 @@
In pseudo-code::
def CommitTransaction():
+ # (see below for the full version with inevitable transactions)
AcquireLocks()
cur_time = global_cur_time
while not CMPXCHG(&global_cur_time, cur_time, cur_time + 2):
cur_time = global_cur_time # try again
- ValidateDuringCommit()
+ if cur_time != start_time:
+ if not ValidateDuringCommit(): # only call it if needed
+ AbortTransaction() # last abort point
UpdateChainHeads(cur_time)
Note the general style of usage of CMPXCHG: we first read normally the
@@ -514,33 +568,29 @@
``h_revision`` field; it does not involve OS-specific thread locks::
def AcquireLocks():
- for (R, L, 0) in gcroots:
+ for (R, L, 0) in gcroots SORTED BY R:
v = R->h_revision
if not (v & 1): # "is a pointer", i.e.
AbortTransaction() # "has a more recent revision"
if v >= LOCKED: # already locked by someone else
- spin loop retry OR # jump back to the "v = ..." line
- AbortTransaction()
+ spin loop retry # jump back to the "v = ..." line
if not CMPXCHG(&R->h_revision, v, my_lock):
spin loop retry # jump back to the "v = ..." line
save v into the third item in gcroots, replacing the 0
-(Note that for non-written local objects, we skip this locking entirely;
-instead, we turn the object into a "global but outdated" object, keeping
-the same ``h_revision`` but with a different meaning.)
-
We use CMPXCHG to store the lock. This is required, because we must not
conflict with another CPU that would try to write its own lock in the
-same field --- in that case, only one CPU can succeed. The order of
-enumeration of ``global_to_local`` must be the same one --- for example,
-following the numeric order of ``R``. This is needed to avoid
-deadlocks. Alternatively we could consider this case rare, and abort
-instead of waiting.
+same field --- in that case, only one CPU can succeed.
+
+Acquiring multiple locks comes with the question of how to avoid
+deadlocks. In this case, it is prevented by ordering the lock
+acquisitions in the numeric order of the R pointers. This should be
+enough to prevent deadlocks even if two threads have several objects in
+common in their gcroots.
The lock's value ``my_lock`` is, precisely, a very large odd number, at
-least LOCKED (which should be some value like 0xFFFF0000). As we can
-check, this is enough to cause ``LatestGlobalRevision`` to spin loop,
-calling ``ValidateDuringTransaction`` over and over again, until the
+least LOCKED (which should be some value like 0xFFFF0000).
+Such a value causes ``LatestGlobalRevision`` to spin loop until the
lock is released (i.e. another value is written in ``h_revision``).
@@ -550,10 +600,14 @@
done by writing back the original timestamps in the ``h_revision``
fields::
- def AbortTransaction():
+ def CancelLocks():
for (R, L, v) in gcroots:
if v != 0:
R->h_revision = v
+ reset the entry in gcroots to v=0
+
+ def AbortTransaction():
+ CancelLocks()
# call longjmp(), which is the function from C
# going back to the transaction start
longjmp()
@@ -592,3 +646,219 @@
``LatestGlobalRevision``, in the loop. It was omitted here because this
is always a no-op (i.e. the CPUs always provide this effect for us), not
only on x86 but on all modern CPUs.
+
+
+Inevitable transactions
+------------------------------------
+
+A transaction is "inevitable" when it cannot abort any more. It occurs
+typically when the transaction tries to do I/O or a similar effect that
+we cannot roll back. Such effects are O.K., but they mean that we have
+to guarantee the transaction's eventual successful commit.
+
+The main restriction is that there can be only one inevitable
+transaction at a time. Right now the model doesn't allow any other
+transaction to start or commit when there is an inevitable transaction;
+this restriction could be lifted with additional work.
+
+For now, the hint that the system has currently got an inevitable
+transaction running is given by the value stored in ``global_cur_time``:
+the largest positive number (equal to the ``INEVITABLE`` constant).
+
+``BecomeInevitable`` is called from the middle of a transaction to
+(attempt to) make the current transaction inevitable::
+
+ def BecomeInevitable():
+ inevitable_mutex.acquire()
+ cur_time = global_cur_time
+ while not CMPXCHG(&global_cur_time, cur_time, INEVITABLE):
+ cur_time = global_cur_time # try again
+ if start_time != cur_time:
+ start_time = cur_time
+ if not ValidateDuringCommit():
+ global_cur_time = cur_time # must restore the value
+ inevitable_mutex.release()
+ AbortTransaction()
+ is_inevitable = True
+
+We use a normal OS mutex to allow other threads to really sleep instead
+of spin-looping until the inevitable transaction finishes. So the
+function ``GetGlobalCurTime`` is defined to return ``global_cur_time``
+after waiting for other inevitable transaction to finish::
+
+ def GetGlobalCurTime():
+ assert not is_inevitable # must not be myself inevitable
+ t = global_cur_time
+ if t == INEVITABLE: # there is another inevitable tr.?
+ inevitable_mutex.acquire() # wait
+ inevitable_mutex.release()
+ return GetGlobalCurTime() # retry
+ return t
+
+Then we extend ``CommitTransaction`` for inevitable support::
+
+ def CommitTransaction():
+ AcquireLocks()
+ if is_inevitable:
+ cur_time = start_time
+ if not CMPXCHG(&global_cur_time, INEVITABLE, cur_time + 2):
+ unreachable: no other thread changed global_cur_time
+ inevitable_mutex.release()
+ else:
+ cur_time = GetGlobalCurTimeInCommit()
+ while not CMPXCHG(&global_cur_time, cur_time, cur_time + 2):
+ cur_time = GetGlobalCurTimeInCommit() # try again
+ if cur_time != start_time:
+ if not ValidateDuringCommit(): # only call it if needed
+ AbortTransaction() # last abort point
+ UpdateChainHeads(cur_time)
+
+ def GetGlobalCurTimeInCommit():
+ t = global_cur_time
+ if t == INEVITABLE:
+ CancelLocks()
+ inevitable_mutex.acquire() # wait until released
+ inevitable_mutex.release()
+ AcquireLocks()
+ return GetGlobalCurTimeInCommit()
+ return t
+
+
+
+Barrier placement in the source code
+============================================================
+
+
+Overview
+-----------
+
+Placing the read/write barriers in the source code is not necessarily
+straightforward, because there are a lot of object states to choose
+from. The barriers described above are just the most common cases.
+
+We classify here the object categories more precisely. A pointer to an
+object in the category ``R`` might actually point to one that is in the
+more precise category ``L`` or ``W``. Conversely, a pointer to an
+object in the category ``L`` is also always in the categories ``R`` or
+``O``. This can be seen more generally in the implication
+relationships::
+
+ W => L => R => O => P G => P (I)
+
+A letter X is called *more general than* a letter Y if ``Y => X``, and
+*more precise than* a letter Y if ``X => Y``.
+
+Barriers are used to make an object's category more precise. Here are
+all 12 interesting conversions, with the five functions from the section
+`Read/write barriers design`_ (abbreviated as DRB, RRB, LRR, WrB and
+WFR) as well as seven more potential conversions (written ``*``) that
+could be implemented efficiently with slight variations:
+
+ +--------+-----------------------------------+
+ | | From |
+ +--------+-----+-----+-----+-----+-----+-----+
+ | To | P | G | O | R | L | W |
+ +========+=====+=====+=====+=====+=====+=====+
+ | R | DRB |``*``| RRB | |
+ +--------+-----+-----+-----+-----+-----------+
+ | L |``*``|``*``|``*``| LRR | |
+ +--------+-----+-----+-----+-----+-----+-----+
+ | W | WrB |``*``|``*``| WFR |``*``| |
+ +--------+-----+-----+-----+-----+-----+-----+
+
+In the sequel we will refer to each of the 12 variations as *X2Y*
+for X in ``P, G, O, R, L`` and Y in ``R, L, W``.
+
+
+Constraints
+-----------
+
+The source code's pointer variables are each assigned one letter
+from ``P, G, O, R, L, W`` such that:
+
+* A variable is only passed into another variable with either the same
+ or a more general letter. This holds for intra- as well as
+ inter-procedural definitions of "being passed" (i.e. also for
+ arguments and return value).
+
+* Read/write barriers can be inserted at any point, returning a variable
+ of a more precise letter.
+
+* Any read must be done on an object in category ``R, L, W``. Any write
+ must be done on an object in category ``W``. Moreover an object must
+ only be in category ``W`` if we can prove that a write necessarily
+ occurs on the object.
+
+* The ``L2W`` barrier is very cheap. It is also the only barrier which
+ doesn't need to return a potentially different pointer. However,
+ converting objects to the ``L`` category in first place (rather than
+ ``R``) has a cost. It should be done only for the objects on which we
+ are *likely* to perform a write.
+
+* An object in the ``R`` category falls back automatically to the ``O``
+ category if we perform an operation (like a call to an unrelated
+ function) that might potentially cause it to be written to.
+
+* If we do a call that might cause the current transaction to end and
+ the next one to start, then all live variables fall back to the ``P``
+ category.
+
+* The ``G`` category is only used by prebuilt constants. In all
+ other cases we don't know that a pointer is definitely not a local
+ pointer. The ``NULL`` constant is in all categories; ``G`` and ``L``
+ have only ``NULL`` in common.
+
+* In general, it is useful to minimize the number of executed barriers,
+ and have the cheapest barriers possible. If, for example, we have a
+ control flow graph with two paths that reach (unconditionally) the
+ same write location, but on one path the object is a ``R`` (because we
+ just read something out of it) and on the other path the object is a
+ ``G`` (because it is a global on which we did not perform any read),
+ then we should insert the ``R2W`` barrier at the end of the first path
+ and the ``G2W`` barrier at the end of the second path, rather than the
+ ``P2W`` barrier only once after the control flow merges.
+
+Pseudo-code for some of the remaining barriers::
+
+ def G2R(G):
+ assert G->h_global
+ return P2R(G) # the fast-path never works
+
+ def G2W(G):
+ assert G->h_global
+ assert not G->h_written
+ if G->h_possibly_outdated:
+ R = LatestGlobalRevision(G)
+ else:
+ R = G
+ W = Localize(R)
+ W->h_written = True
+ R->h_possibly_outdated = True
+ return W
+
+ def L2W(L):
+ if L->h_written: # fast-path
+ return L
+ L->h_written = True
+ L->h_revision->h_possibly_outdated = True
+ return L
+
+Pointer equality: a comparison ``P1 == P2`` needs special care, because
+there are several physical pointers corresponding logically to the same
+object. If ``P1`` or ``P2`` is the constant ``NULL`` then no special
+treatment is needed. Likewise if ``P1`` and ``P2`` are both known to be
+local. Otherwise, we need in general the following code (which could be
+specialized as well if needed)::
+
+ def PtrEq(P1, P2):
+ return GlobalizeForComparison(P1) == GlobalizeForComparison(P2)
+
+ def GlobalizeForComparison(P):
+ if P == NULL:
+ return NULL
+ elif P->h_global:
+ return LatestGlobalRevision(P)
+ elif P->h_local_copy:
+ return P->h_revision # return the original global obj
+ else:
+ return P
More information about the pypy-commit
mailing list