[pypy-commit] extradoc extradoc: Update with the current state, the next things to work on, and more "to

arigo noreply at buildbot.pypy.org
Sat Mar 31 14:27:43 CEST 2012


Author: Armin Rigo <arigo at tunes.org>
Branch: extradoc
Changeset: r4166:d80dd2d3c300
Date: 2012-03-30 18:20 +0200
http://bitbucket.org/pypy/extradoc/changeset/d80dd2d3c300/

Log:	Update with the current state, the next things to work on, and more
	"to do later".

diff --git a/planning/stm.txt b/planning/stm.txt
--- a/planning/stm.txt
+++ b/planning/stm.txt
@@ -2,7 +2,10 @@
 STM planning
 ============
 
-Comments in << >> describe the next thing to work on.
+|
+| Bars on the left describe the next thing to work on.
+| On the other hand, "TODO" means "to do later".
+|
 
 
 Overview
@@ -23,34 +26,43 @@
 access to a global object, we need to make a whole copy of it into our
 nursery.
 
-The RPython program should have at least one hint: "force local copy",
+| The "global area" should be implemented by reusing gc/minimarkpage.py.
+
+The RPython program can use this hint: 'x = hint(x, stm_write=True)',
 which is like writing to an object in the sense that it forces a local
 copy.
 
-We need annotator support to track which variables contain objects that
-are known to be local.  It lets us avoid the run-time check.  That's
-useful for all freshly malloc'ed objects, which we know are always
-local; and that's useful for special cases like the PyFrames, on which
-we would use the "force local copy" hint before running the
-interpreter.  In both cases the result is: no STM code is needed any
-more.
+In translator.stm.transform, we track which variables contain objects
+that are known to be local.  It lets us avoid the run-time check.
+That's useful for all freshly malloc'ed objects, which we know are
+always local; and that's useful for special cases like the PyFrames, on
+which we used the "stm_write=True" hint before running the interpreter.
+In both cases the result is: no STM code is needed any more.
 
 When a transaction commits, we do a "minor collection"-like process,
 called an "end-of-transaction collection": we move all surviving objects
-from the nursery to the global area, either as new objects, or as
-overwrites of their previous version.  Unlike the minor collections in
-other GCs, this one occurs at a well-defined time, with no stack roots
-to scan.
+from the nursery to the global area, either as new objects (first step
+done by stmgc.py), or as overwrites of their previous version (second
+step done by et.c).  Unlike the minor collections in other GCs, this one
+occurs at a well-defined time, with no stack roots to scan.
 
-Later we'll need to consider what occurs if a nursery grows too big
-while the transaction is still not finished.  Probably somehow run a
-collection of the nursery itself, not touching the global area.
-
-Of course we also need to do from time to time a major collection.  We
-will need at some point some concurrency here, to be able to run the
-major collection in a random thread t but detecting changes done by the
-other threads overwriting objects during their own end-of-transaction
-collections.
+| We also need to consider what occurs if a nursery grows too big while
+| the transaction is still not finished.  In this case we need to run a
+| similar collection of the nursery, but with stack roots to scan.  We
+| call this a local collection.
+|
+| This can also occur before or after we call transaction.run(), when
+| there is only the main thread running.  In this mode, we run the main
+| thread with a nursery too.  It can fill up, needing a local collection.
+| When transaction.run() is called, we also do a local collection to
+| ensure that the nursery of the main thread is empty while the
+| transactions execute.
+|
+| Of course we also need to do from time to time a major collection.  We
+| will need at some point some concurrency here, to be able to run the
+| major collection in a random thread t but detecting changes done by the
+| other threads overwriting objects during their own end-of-transaction
+| collections.  See below.
 
 
 GC flags
@@ -68,16 +80,13 @@
 
 (Optimization: objects declared immutable don't need a version number.)
 
-GC_WAS_COPIED should rather be some counter, counting how many threads
+TODO: GC_WAS_COPIED should rather be some counter, counting how many threads
 have a local copy; something like 2 or 3 bits, where the maximum value
 means "overflowed" and is sticky (maybe until some global
 synchronization point, if we have one).  Or, we can be more advanced and
 use 4-5 bits, where in addition we use some "thread hash" value if there
 is only one copy.
 
-<< NOW: implemented a minimal GC model with these properties.  We have
-GC_GLOBAL, a single bit of GC_WAS_COPIED, and the version number. >>
-
 
 stm_read
 --------
@@ -102,8 +111,8 @@
   depending on cases).  And if the read is accepted then we need to
   remember in a local list that we've read that object.
 
-<< NOW: the thread's local dictionary is in C, as a search tree.
-The rest of the logic here is straightforward. >>
+For now the thread's local dictionary is in C, as a widely-branching
+search tree.
 
 
 stm_write
@@ -123,10 +132,9 @@
 consistent copy (i.e. nobody changed the object in the middle of us
 reading it).  If it is too recent, then we might have to abort.
 
-<< NOW: done, straightforward >>
-
 TODO: how do we handle MemoryErrors when making a local copy??
 Maybe force the transaction to abort, and then re-raise MemoryError
+--- for now it's just a fatal error.
 
 
 End-of-transaction collections
@@ -146,61 +154,73 @@
 We need to check that each of these global objects' versions have not
 been modified in the meantime.
 
-<< NOW: done, kind of easy >>
 
-
-Annotator support
------------------
+Static analysis support
+-----------------------
 
 To get good performance, we should as much as possible use the
 'localobj' version of every object instead of the 'obj' one.  At least
 after a write barrier we should replace the local variable 'obj' with
-'localobj', and someone (the annotator? or later?) should propagate the
+'localobj', and translator.stm.transform propagates the
 fact that it is now a localobj that doesn't need special stm support
 any longer.  Similarly, all mallocs return a localobj.
 
-The "force local copy" hint should be used on PyFrame before the main
+The "stm_write=True" hint is used on PyFrame before the main
 interpreter loop, so that we can then be sure that all accesses to
-'frame' are to a local obj.  Ideally, we could even track which fields
+'frame' are to a local obj.
+
+TODO: Ideally, we could even track which fields
 of a localobj are themselves localobjs.  This would be useful for
 'PyFrame.fastlocals_w': it should also be known to always be a localobj.
 
-<< NOW: done in the basic form by translator/stm/transform.py.
-Runs late (just before C databasing).  Should work well enough to
-remove the maximum number of write barriers, but still missing
-PyFrame.fastlocals_w. >>
-
 
 Local collections
 -----------------
 
-If the nursery fills up too much during a transaction, it needs to be
-locally collected.  This is supposed to be a generally rare occurrance.
+|
+| This needs to be done.
+|
+
+If a nursery fills up too much during a transaction, it needs to be
+locally collected.  This is supposed to be a generally rare occurrance,
+with the exception of long-running transactions --- including the main
+thread before transaction.run().
+
+Surviving local objects are moved to the global area.  However, the
+GC_GLOBAL flag is still not set on them, because they are still not
+visible from more than one thread.  For now we have to put all such
+objects in a list: the list of old-but-local objects.  (Some of these
+objects can still have the GC_WAS_COPIED flag and so be duplicates of
+other really global objects.  The dict maintained by et.c must be
+updated when we move these objects.)
+
 Unlike end-of-transaction collections, we need to have the stack roots
-of the current transaction.  Because such collections are more rare than
-in previous GCs, we could use for performance a completely different
-approach: conservatively scan the stack, finding everything that looks
-like a pointer to an object in the nursery; mark these objects as roots;
-and do a local collection from there.  We need either a non-moving GC or
-at least to pin the potential roots.  Pinning is better in the sense
-that it should ideally pin a small number of objects, and all other
-objects can move away; this would free most of the nursery again.
-Afterwards we can still use a bump-pointer allocation technique, to
-allocate within each area between the pinned objects.  The objects are
-pinned just for one local collection, which means that number of such
-pinned objects should remain roughly constant as time passes.
+of the current transaction.  For now we just use
+"gcrootfinder=shadowstack" with thread-local variables.  At the end of
+the local collection, we do a sweep: all objects that were previously
+listed as old-but-local but don't survive the present collection are
+marked as free.
 
-The local collection is also a good time to compress the local list of
-all global reads done --- "compress" in the sense of removing
-duplicates.
+TODO: Try to have a generational behavior here.  Could probably be done
+by (carefully) promoting part of the surviving objects to GC_GLOBAL.
 
-<< do later; memory usage grows unboundedly during one transaction for
-now. >>
+If implemented like minimarkpage.py, the global area has for each size a
+chained list of pages that are (at least partially) free.  We make the
+heads of the chained lists thread-locals; so each thread reserves one
+complete page at a time, reducing cross-thread synchronizations.
+
+TODO: The local collection would also be a good time to compress the
+local list of all global reads done --- "compress" in the sense of
+removing duplicates.
 
 
 Global collections
 ------------------
 
+|
+| This needs to be done.
+|
+
 We will sometimes need to do a "major" collection, called global
 collection here.  The issue with it is that there might be live
 references to global objects in the local objects of any thread.  The
@@ -208,30 +228,29 @@
 some system call.  As an intermediate solution that should work well
 enough, we could try to acquire a lock for every thread, a kind of LIL
 (local interpreter lock).  Every thread releases its LIL around
-potentially-blocking system calls.  At the end of a transaction and
-maybe once per local collection, we also do the equivalent of a
-release-and-require-the-LIL.
+potentially-blocking system calls.  At the end of a transaction and once
+per local collection, we also do the equivalent of a
+release-and-require-the-LIL.  The point is that when a LIL is released,
+another thread can acquire it temporarily and read the shadowstack of
+that thread.
 
-The major collection could be orchestrated by either the thread that
-noticed one should start, or by its own thread.  We first acquire all
-the LILs, and for every LIL, we ask the corresponding thread to do a
-local marking, starting from their own stacks and scanning their local
-nurseries.  Out of this, we obtain a list of global objects.
+The major collection is orchestrated by whichever thread noticed one
+should start; let's call this thread tg.  So tg first acquires all the
+LILs.  (A way to force another thread to "soon" release its LIL is to
+artifically mark its nursery as exhausted.)  For each thread t, tg
+performs a local collection for t.  This empties all the nurseries and
+gives tg an up-to-date point of view on the liveness of the objects: the
+various lists of old-but-local objects for all the t's.  tg can use
+these --- plus external roots like prebuilt objects --- as the roots of
+a second-level, global mark-and-sweep.
 
-Then we can resume running the threads while at the same time doing a
-mark-n-sweep collection of the global objects.  There is never any
-pointer from a global object to a local object, but some global objects
-are duplicated in one or several local nurseries.  To simplify, these
-duplicates should be considered as additional roots for local marking,
-and the original objects should be additional roots for global marking.
-At some point we might figure out a way to allow duplicated objects to
-be freed too.
+For now we release the LILs only when the major collection is finished.
 
-The global objects are read-only, at least if there is no commit.  If we
-don't want to block the other threads we need support for detecting
-commit-time concurrent writes.  Alternatively, we can ask the threads to
-do all together a parallel global marking; this would have a
-stop-the-world effect, but require no concurrency detection mechanism.
+TODO: either release the LILs earlier, say after we processed the lists
+of old-but-local objects but before we went on marking and sweeping ---
+but we need support for detecting concurrent writes done by concurrent
+commits; or, ask all threads currently waiting on the LIL to help with
+doing the global mark-and-sweep in parallel.
 
 Note: standard terminology:
 
@@ -242,12 +261,6 @@
 * Parallelism: there are multiple threads all doing something GC-related,
   like all scanning the heap together.
 
-<< at first the global area keeps growing unboundedly.  The next step
-will be to add the LIL but run the global collection by keeping all
-other threads blocked.  NOW: think about, at least, doing "minor
-collections" on the global area *before* we even start running
-transactions. >>
-
 
 When not running transactively
 ------------------------------
@@ -255,25 +268,13 @@
 The above describes the mode during which there is a main thread blocked
 in transaction.run().  The other mode is mostly that of "start-up",
 before we call transaction.run().  Of course no STM is needed in that
-mode, but it's still running the same STM-enabled interpreter.  We need
-to figure out how to tweak the above concepts for that mode.
+mode, but it's still running the same STM-enabled interpreter.
 
-We can probably abuse the notion of nursery above, by running with one
-nursery (corresponding to the only thread running, the main thread).  We
-would need to do collections that are some intermediate between "local
-collections" and "end-of-transaction collections".  Likely, a scheme
-that might work would be similar to local collections (with some pinned
-objects) but where surviving non-pinned objects are moved to become
-global objects.
-
-This needs a bit more thinking: the issue is that when transaction.run()
-is called, we can try to do such a collection, but what about the pinned
-objects?
-
-<< NOW: the global area is just the "nursery" for the main thread.
-stm_writebarrier of 'obj' return 'obj' in the main thread.  All
-allocations get us directly a global object, but allocated from
-the "nursery" of the main thread, with bump-pointer allocation. >>
+| In this mode, we just have one nursery and the global area.  When
+| transaction.run() is called, we do a local collection to empty it, then
+| make sure to flag all surviving objects as GC_GLOBAL in preparation for
+| starting actual transactions.  Then we can reuse the nursery itself for
+| one of the threads.
 
 
 Pointer equality
@@ -284,18 +285,11 @@
 This is all llops of the form ``ptr_eq(x, y)`` or ``ptr_ne(x, y)``.
 
 If we know statically that both copies are local copies, then we can
-just compare the pointers.  Otherwise we need to check their GC_GLOBAL
-and GC_WAS_COPIED flag, and potentially if they both have GC_WAS_COPIED
-but only one of them has GC_GLOBAL, we need to check in the local
-dictionary if they map to each other.  And we need to take care of the
-cases of NULL pointers.
-
-<< NOW: done, without needing the local dictionary:
-stm_normalize_global(obj) returns globalobj if obj is a local,
-WAS_COPIED object.  Then a pointer comparison 'x == y' becomes
-stm_normalize_global(x) == stm_normalize_global(y).  Moreover
-the call to stm_normalize_global() can be omitted for constants. >>
-
+just compare the pointers.  Otherwise, we compare
+``stm_normalize_global(x)`` with ``stm_normalize_global(y)``, where
+``stm_normalize_global(obj)`` returns ``globalobj`` if ``obj`` is a
+local, GC_WAS_COPIED object.  Moreover the call to
+``stm_normalize_global()`` can be omitted for constants.
 
 
 notes


More information about the pypy-commit mailing list