[pypy-commit] pypy stm-gc: Documentation, documentation, documentation. It helps a lot
arigo
noreply at buildbot.pypy.org
Mon Apr 23 21:11:11 CEST 2012
Author: Armin Rigo <arigo at tunes.org>
Branch: stm-gc
Changeset: r54694:c4fe5a7ab290
Date: 2012-04-23 21:10 +0200
http://bitbucket.org/pypy/pypy/changeset/c4fe5a7ab290/
Log: Documentation, documentation, documentation. It helps a lot to make
my confused mind a bit clearer, and it should trigger a few
simplifications too.
diff --git a/pypy/rpython/memory/gc/stmgc.py b/pypy/rpython/memory/gc/stmgc.py
--- a/pypy/rpython/memory/gc/stmgc.py
+++ b/pypy/rpython/memory/gc/stmgc.py
@@ -28,7 +28,7 @@
# - the non-small raw-malloced objects
#
# - The GLOBAL objects are all located in the shared area.
-# All global objects are non-movable.
+# All GLOBAL objects are non-movable.
#
# - The LOCAL objects might be YOUNG or OLD depending on whether they
# already survived a collection. YOUNG LOCAL objects are either in
@@ -38,11 +38,112 @@
# is not actually generational (slow when running long transactions
# or before running transactions at all).
#
+# - A few details are different depending on the running mode:
+# either "transactional" or "non-transactional". The transactional
+# mode is where we have multiple threads, in a transaction.run()
+# call. The non-transactional mode has got only the main thread.
+#
+# GC Flags on objects:
+#
+# - GCFLAG_GLOBAL: identifies GLOBAL objects. All prebuilt objects
+# start as GLOBAL; conversely, all freshly allocated objects start
+# as LOCAL. But they may switch between the two; see below.
+# All objects that are or have been GLOBAL are immortal for now
+# (global_collect() will be done later).
+#
+# - GCFLAG_WAS_COPIED: means that the object is either a LOCAL COPY
+# or, if GLOBAL, then it has or had at least one LOCAL COPY. Used
+# in transactional mode only; see below.
+#
+# - GCFLAG_VISITED: used during collections to flag objects found to be
+# surviving. Between collections, it must be set on LOCAL COPY objects
+# and only on them.
+#
+# - GCFLAG_HAS_SHADOW: set on nursery objects whose id() or identityhash()
+# was taken. Means that we already have a corresponding object allocated
+# outside the nursery.
+#
+# - GCFLAG_FIXED_HASH: only on some prebuilt objects. For identityhash().
+#
+# When the mutator (= the program outside the GC) wants to write to an
+# object, stm_writebarrier() does something special on GLOBAL objects:
+#
+# - In non-transactional mode, the write barrier turns the object LOCAL
+# and add it in the list 'main_thread_tls.mt_global_turned_local'.
+# This list contains all previously-GLOBAL objects that have been
+# modified. Objects turned LOCAL are changed back to GLOBAL and
+# removed from 'mt_global_turned_local' by the next collection,
+# unless they are also found in the stack (the reason being that if
+# they are in the stack and stm_writebarrier() has already been
+# called, then it might not be called a second time if they are
+# changed again after collection).
+#
+# - In transactional mode, the write barrier creates a LOCAL COPY of
+# the object and returns it (or, if already created by the same
+# transaction, finds it again). The list of LOCAL COPY objects has
+# a role similar to 'mt_global_turned_local', but is maintained by C
+# code (see tldict_lookup()).
+#
+# Invariant: between two transactions, all objects visible from the current
+# thread are always GLOBAL. In particular:
+#
+# - The LOCAL object of a thread are not visible at all from other threads.
+# This means that in transactional mode there is *no* pointer from a
+# GLOBAL object directly to a LOCAL object.
+#
+# - At the end of enter_transactional_mode(), and at the beginning of
+# leave_transactional_mode(), *all* objects everywhere are GLOBAL.
+#
+# Collection: for now we have only local_collection(), which ignores all
+# GLOBAL objects.
+#
+# - In non-transactional mode, we use 'mt_global_turned_local' as a list
+# of roots, together with the stack. By construction, all objects that
+# are still GLOBAL can be ignored, because they cannot point to a LOCAL
+# object (except to a 'mt_global_turned_local' object).
+#
+# - In transactional mode, we similarly use the list maintained by C code
+# of the LOCAL COPY objects of the current transaction, together with
+# the stack. Again, GLOBAL objects can be ignored because they have no
+# pointer to any LOCAL object at all in that mode.
+#
+# - A special case is the end-of-transaction collection, done by the same
+# local_collection() with a twist: all pointers to a LOCAL COPY object
+# are replaced with copies to the corresponding GLOBAL original. When
+# it is done, we mark all surviving LOCAL objects as GLOBAL too, and we
+# are back to the situation where this thread sees only GLOBAL objects.
+# What we leave to the C code to do "as a finishing touch" is to copy
+# transactionally the content of the LOCAL COPY objects back over the
+# GLOBAL originals; before this is done, the transaction can be aborted
+# at any point with no visible side-effect on any object that other
+# threads can see.
+#
+# All objects have an address-sized 'version' field in their header. On
+# GLOBAL objects, it is used as a version by C code to handle STM (it must
+# be set to 0 when the object first turns GLOBAL). On the LOCAL objects,
+# though, it is abused here in the GC:
+#
+# - if GCFLAG_WAS_COPIED, it points to the GLOBAL original.
+#
+# - if GCFLAG_HAS_SHADOW, to the shadow object outside the nursery.
+# (It is not used on any other nursery object.)
+#
+# - it contains the 'next' object of the 'mt_global_turned_local' list.
+#
+# - it contains the 'next' object of the 'sharedarea_tls.chained_list'
+# list, which describes all LOCAL objects malloced outside the nursery
+# (excluding the ones that were GLOBAL at some point).
+#
+# - for nursery objects, during collection, if they are copied outside
+# the nursery, they grow GCFLAG_VISITED and their 'version' points
+# to the fresh copy.
+#
+
GCFLAG_GLOBAL = first_gcflag << 0 # keep in sync with et.c
GCFLAG_WAS_COPIED = first_gcflag << 1 # keep in sync with et.c
-GCFLAG_HAS_SHADOW = first_gcflag << 2
-GCFLAG_FIXED_HASH = first_gcflag << 3
-GCFLAG_VISITED = first_gcflag << 4
+GCFLAG_VISITED = first_gcflag << 2
+GCFLAG_HAS_SHADOW = first_gcflag << 3
+GCFLAG_FIXED_HASH = first_gcflag << 4
def always_inline(fn):
@@ -320,8 +421,8 @@
# The raw copy done above does not include the header fields.
hdr = self.header(obj)
localhdr = self.header(localobj)
- GCFLAGS = (GCFLAG_GLOBAL | GCFLAG_WAS_COPIED)
- ll_assert(hdr.tid & GCFLAGS == GCFLAGS,
+ GCFLAGS = (GCFLAG_GLOBAL | GCFLAG_WAS_COPIED | GCFLAG_VISITED)
+ ll_assert(hdr.tid & GCFLAGS == (GCFLAG_GLOBAL | GCFLAG_WAS_COPIED),
"stm_write: bogus flags on source object")
#
# Remove the GCFLAG_GLOBAL from the copy, and add GCFLAG_VISITED
diff --git a/pypy/rpython/memory/gc/stmtls.py b/pypy/rpython/memory/gc/stmtls.py
--- a/pypy/rpython/memory/gc/stmtls.py
+++ b/pypy/rpython/memory/gc/stmtls.py
@@ -50,7 +50,7 @@
# --- main thread only: this is the list of GLOBAL objects that
# have been turned into LOCAL objects
if in_main_thread:
- self.main_thread_was_global_objects = NULL
+ self.mt_global_turned_local = NULL
#
self._register_with_C_code()
@@ -111,11 +111,11 @@
self.stop_transaction()
#
# We must also mark the following objects as GLOBAL again
- obj = self.main_thread_was_global_objects
- self.main_thread_was_global_objects = NULL
+ obj = self.mt_global_turned_local
+ self.mt_global_turned_local = NULL
self._promote_list_to_globals(obj)
if not we_are_translated():
- del self.main_thread_was_global_objects # don't use any more
+ del self.mt_global_turned_local # don't use any more
def leave_transactional_mode(self):
"""Restart using the main thread for mallocs."""
@@ -133,7 +133,7 @@
# and will not be called again before writing. But such objects
# are right now directly in the stack. So to fix this issue, we
# conservatively mark as local all objects directly from the stack.
- self.main_thread_was_global_objects = NULL
+ self.mt_global_turned_local = NULL
self.gc.root_walker.walk_current_stack_roots(
StmGCTLS._remark_object_as_local, self)
@@ -212,7 +212,7 @@
if not self.in_main_thread:
self.collect_roots_from_tldict()
else:
- self.collect_from_main_thread_was_global_objects()
+ self.collect_from_mt_global_turned_local()
#
# Now repeatedly follow objects until 'pending' is empty.
self.collect_flush_pending()
@@ -293,8 +293,8 @@
"write in main thread: unexpected GCFLAG_WAS_COPIED")
hdr.tid &= ~GCFLAG_GLOBAL
# add the object into this linked list
- hdr.version = self.main_thread_was_global_objects
- self.main_thread_was_global_objects = obj
+ hdr.version = self.mt_global_turned_local
+ self.mt_global_turned_local = obj
# ------------------------------------------------------------
@@ -500,14 +500,14 @@
#
self.trace_and_drag_out_of_nursery(localobj)
- def collect_from_main_thread_was_global_objects(self):
- # NB. all objects in the 'main_thread_was_global_objects' list are
+ def collect_from_mt_global_turned_local(self):
+ # NB. all objects in the 'mt_global_turned_local' list are
# currently immortal (because they were once GLOBAL)
- obj = self.main_thread_was_global_objects
+ obj = self.mt_global_turned_local
while obj:
hdr = self.gc.header(obj)
ll_assert(hdr.tid & GCFLAG_GLOBAL == 0,
- "unexpected GLOBAL in main_thread_was_global_objects")
+ "unexpected GLOBAL in mt_global_turned_local")
if hdr.tid & GCFLAG_VISITED == 0:
hdr.tid |= GCFLAG_VISITED
self.pending.append(obj)
More information about the pypy-commit
mailing list