[pypy-commit] extradoc extradoc: merge

hakanardo noreply at buildbot.pypy.org
Fri Aug 17 12:21:08 CEST 2012


Author: Hakan Ardo <hakan at debian.org>
Branch: extradoc
Changeset: r4663:04439fef5415
Date: 2012-08-17 12:20 +0200
http://bitbucket.org/pypy/extradoc/changeset/04439fef5415/

Log:	merge

diff --git a/talk/stm2012/stmimpl.rst b/talk/stm2012/stmimpl.rst
--- a/talk/stm2012/stmimpl.rst
+++ b/talk/stm2012/stmimpl.rst
@@ -19,7 +19,8 @@
 done in a local copy.  If this transaction successfully commits, the
 original global object is *not* changed --- it is really immutable.  But
 the copy becomes global, and the old global object's header is updated
-with a pointer to the new global object.
+with a pointer to the new global object.  We thus make a chained list
+of global versions.
 
 
 CPUs model
@@ -31,9 +32,9 @@
 be delayed and only show up later in main memory.  The delayed stores
 are always flushed to main memory in program order.
 
-Of course if the same CPU loads a value just stored, it will see the
+Of course if the same CPU loads a value it just stored, it will see the
 value as modified (self-consistency); but other CPUs might temporarily
-still see the old value.
+see the old value.
 
 The MFENCE instruction waits until all delayed stores from this CPU have
 been flushed.  (A CPU has no built-in way to wait until *other* CPU's
@@ -49,12 +50,23 @@
 Every object starts with three fields:
 
 - h_global (boolean)
-- h_nonmodified (boolean)
+- h_possibly_outdated (boolean)
+- h_written (boolean)
 - h_version (unsigned integer)
 
 The h_version is an unsigned "version number".  More about it below.
-The other two fields are flags.  (In practice they are just two bits
-of the GC h_tid field.)
+The other fields are flags.  (In practice they are just bits inside the
+GC h_tid field.)
+
+- ``h_global`` means that the object is a global object.
+
+- ``h_possibly_outdated`` is used as an optimization: it means that the
+  object is possibly outdated.  It is False for all local objects.  It
+  is also False if the object is a global object, is the most recent of
+  its chained list of versions, and is known to have no ``global2local``
+  entry in any transaction.
+
+- ``h_written`` is set on local objects that have been written to.
 
 
 Transaction details
@@ -65,6 +77,8 @@
 
 - start_time
 - global2local
+- list_of_read_objects
+- recent_reads_cache
 
 The ``start_time`` is the "time" at which the transaction started.  All
 reads and writes done so far in the transaction appear consistent with
@@ -74,8 +88,17 @@
 ``global2local`` is a dictionary-like mapping of global objects to their
 corresponding local objects.
 
+``list_of_read_objects`` is a set of all global objects read from, in
+the version that was used for reading.  It is actually implemented as a
+list, but the order or repeated presence of elements in the list is
+irrelevant.
 
-Pseudo-code during transactions
+``recent_reads_cache`` is a fixed-size cache that remembers recent
+additions to the preceeding list, in order to avoid inserting too much
+repeated entries into the list, as well as keep lightweight statistics.
+
+
+Pseudo-code: read/write barriers
 ---------------------------------------
 
 Variable names:
@@ -87,19 +110,19 @@
 * ``R`` is a pointer to an object that was checked for being
   *read-ready*: reading its fields is ok.
 
-* ``L`` is a pointer to a *local* object.  Reading its fields is
-  always ok, but not necessarily writing.
+* ``L`` is a pointer to a *local* object.  We can always read from
+  but not necessarily write to local objects.
 
-* ``W`` is a pointer to a local object ready to *write*.
+* ``W`` is a pointer to a *writable* local object.
 
 
-``W = Allocate(size)`` allocates a local object, and as the name of
-the variable suggests, returns it ready to write::
+``W = Allocate(size)`` allocates a local object::
 
     def Allocate(size):
         W = malloc(size)
         W->h_global = False
-        W->h_nonmodified = False
+        W->h_possibly_outdated = False
+        W->h_written = True
         W->h_version = 0
         return W
 
@@ -115,7 +138,7 @@
         while (v := R->h_version) & 1:    # "has a more recent version"
             R = v & ~ 1
         if v > start_time:                # object too recent?
-            validate_fast()               # try to move start_time forward
+            ValidateFast()                # try to move start_time forward
             return LatestGlobalVersion(G) # restart searching from G
         PossiblyUpdateChain(G)
         return R
@@ -125,107 +148,113 @@
 It takes a random pointer ``P`` and returns a possibly different pointer
 ``R`` out of which we can read from the object.  The result ``R``
 remains valid for read access until either the current transaction ends,
-or until a write into the same object is done.
-
-::
+or until a write into the same object is done.  Pseudo-code::
 
     def DirectReadBarrier(P):
         if not P->h_global:         # fast-path
             return P
-        R = LatestGlobalVersion(P)
+        if not P->h_possibly_outdated:
+            R = P
+        else:
+            R = LatestGlobalVersion(P)
+            if R->h_possibly_outdated and R in global2local:
+                L = global2local[R]
+                return L
+        R = AddInReadSet(R)         # see below
+        return R
+
+
+A simple optimization is possible.  If ``R`` is returned by a previous
+call to ``DirectReadBarrier`` and the current transaction is still
+running, but we could have written to ``R`` in the meantime, then we
+need to repeat only part of the logic, because we don't need
+``AddInReadSet`` again.  It gives this::
+
+    def RepeatReadBarrier(R):
+        if not R->h_possibly_outdated:   # fast-path
+            return R
+        # LatestGlobalVersion(R) would either return R or abort
+        # the whole transaction, so omitting it is not wrong
         if R in global2local:
             L = global2local[R]
             return L
-        else:
-            AddInReadSet(R)         # see below
-            return R
+        return R
 
 
-``L = Localize(R)`` is an operation that takes a read-ready pointer to
-a global object and returns a corresponding pointer to a local object.
-
-::
+``L = Localize(R)`` is an operation that takes a read-ready pointer to a
+global object and returns a corresponding pointer to a local object::
 
     def Localize(R):
-        if P in global2local:
-            return global2local[P]
+        if R in global2local:
+            return global2local[R]
         L = malloc(sizeof R)
-        L->h_nonmodified = True
-        L->h_version = P
+        L->h_global = False
+        L->h_possibly_outdated = False
+        L->h_written = False
+        L->h_version = R          # back-reference to the original
         L->objectbody... = R->objectbody...
         global2local[R] = L
         return L
 
 
-``L = LocalizeReadBarrier(P)`` is a different version of the read
-barrier that works by returning a local object.
+``W = WriteBarrier(P)`` and ``W = WriteBarrierFromReadReady(R)`` are
+two versions of the write barrier::
 
-::
-
-    def LocalizeReadBarrier(P):
+    def WriteBarrier(P):
         if not P->h_global:       # fast-path
             return P
-        R = LatestGlobalVersion(P)
-        L = Localize(R)
-        return L
-
-
-``W = WriteBarrier(P)`` is the write barrier.
-
-::
-
-    def WriteBarrier(P):
-        W = LocalizeReadBarrier(P)
-        W->h_nonmodified = False
+        if P->h_possibly_outdated:
+            R = LatestGlobalVersion(P)
+        else:
+            R = P
+        W = Localize(R)
+        W->h_written = True
+        R->h_possibly_outdated = True
         return W
 
+    def WriteBarrierFromReadReady(P):
+        if not R->h_global:       # fast-path
+            return R
+        W = Localize(R)
+        W->h_written = True
+        R->h_possibly_outdated = True
+        return W
 
-``R = AdaptiveReadBarrier(P)`` is the adaptive read barrier.  It can use
-the technique of either ``DirectReadBarrier`` or
-``LocalizeReadBarrier``, based on heuristics for better performance::
 
-    def AdaptiveReadBarrier(P):
-        if not P->h_global:       # fast-path
-            return P
-        R = LatestGlobalVersion(P)
-        if R in global2local:
-            return global2local[R]
-        if R seen often enough in readset:
-            L = Localize(R)       # LocalizeReadBarrier
-            return L
+Auto-localization of some objects
+----------------------------------------
+
+The "fast-path" markers above are quick checks that are supposed to be
+inlined in the caller, so that we only have to pay for a full call to a
+barrier implementation when the fast-path fails.
+
+However, even the fast-path of ``DirectReadBarrier`` fails repeatedly
+when the ``DirectReadBarrier`` is invoked repeatedly on the same set of
+global objects.  This occurs in example of code that repeatedly
+traverses the same data structure, visiting the same objects over and
+over again.
+
+If the objects that make up the data structure were local, then we would
+completely avoid triggering the read barrier's implementation.  So
+occasionally, it is better to *localize* global objects even when they
+are only read from.
+
+This is done by tweaking ``AddInReadSet``, whose main purpose is to
+record the read object in a set (actually a list)::
+
+    def AddInReadSet(R):
+        if R not in recent_reads_cache:
+            list_of_read_objects.append(R)
+            recent_reads_cache[R] = 1
+            # the cache is fixed-size, so the line above
+            # possibly evinces another older entry
+            return R
         else:
-            AddInReadSet(R)       # DirectReadBarrier
-            return R
-
-
-This adaptive localization of read-only objects is useful for example in
-the following situation: we have a pointer ``P1`` to some parent object,
-out of which we repeatedly try to read the same field ``Field`` and use
-the result ``P`` in some call.  Because the call may possibly have write
-effects to the parent object, we normally need to redo
-``DirectReadBarrier`` on ``P1`` every time.  If instead we do
-``AdaptiveReadBarrier`` then after a few iterations it will localize the
-object and return ``L1``.  On ``L1`` no read barrier is needed any more.
-
-Moreover, if we also need to read the subobject ``P``, we also need to
-call a read barrier on it every time.  It may return ``L`` after a few
-iterations, but this time we win less, because during the next iteration
-we again read ``P`` out of ``L1``.  The trick is that when we read a
-field out of a local object ``L1``, and it is a pointer on which we
-subsequently do a read barrier, then afterwards we can update the
-original pointer directly in ``L1``.
-
-Similarily, if we start with a global ``R1`` and read a pointer ``P``
-which is updated to its latest global version ``R``, then we can update
-the original pointer in-place.
-
-The only case in which it is not permitted xxx
-
-::
-
-    def DependentUpdate(R1, Field, R):
-        if R1->h_global:     # can't modify R1 unless it is local
-            return
-        R1->Field = R        # possibly update the pointer
-
-
+            count = recent_reads_cache[R]
+            count += 1
+            recent_reads_cache[R] = count
+            if count < THRESHOLD:
+                return R
+            else:
+                L = Localize(R) 
+                return L


More information about the pypy-commit mailing list