[pypy-svn] r51430 - pypy/dist/pypy/doc/discussion
arigo at codespeak.net
arigo at codespeak.net
Wed Feb 13 12:18:04 CET 2008
Author: arigo
Date: Wed Feb 13 12:18:02 2008
New Revision: 51430
Added:
pypy/dist/pypy/doc/discussion/finalizer-order.txt (contents, props changed)
Log:
Finalizer ordering draft.
Added: pypy/dist/pypy/doc/discussion/finalizer-order.txt
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/doc/discussion/finalizer-order.txt Wed Feb 13 12:18:02 2008
@@ -0,0 +1,152 @@
+Ordering finalizers in the SemiSpace GC
+=======================================
+
+Goal
+----
+
+After a collection, the SemiSpace GC should call the finalizers on
+*some* of the objects that have one and that have become unreachable.
+Basically, if there is a reference chain from an object a to an object b
+then it should not call the finalizer for b immediately, but just keep b
+alive and try again to call its finalizer after the next collection.
+
+This basic idea fails when there are cycles. It's not a good idea to
+keep the objects alive forever or to never call any of the finalizers.
+The model we came up with is that in this case, we could just call the
+finalizer of one of the objects in the cycle -- but only, of course, if
+there are no other objects outside the cycle that has a finalizer and a
+reference to the cycle.
+
+More precisely, given the graph of references between objects::
+
+ for each strongly connected component C of the graph:
+ if C has at least one object with a finalizer:
+ if there is no object outside C which has a finalizer and
+ indirectly references the objects in C:
+ mark one of the objects of C that has a finalizer
+ copy C and all objects it references to the new space
+
+ for each marked object:
+ detach the finalizer (so that it's not called more than once)
+ call the finalizer
+
+Algorithm
+---------
+
+During deal_with_objects_with_finalizers(), each object x can be in 4
+possible states::
+
+ state[x] == 0: unreachable
+ state[x] == 1: (temporary state, see below)
+ state[x] == 2: reachable from any finalizer
+ state[x] == 3: alive
+
+Initially, objects are in state 0 or 3 depending on whether they have
+been copied or not by the regular sweep done just before. The invariant
+is that if there is a reference from x to y, then state[y] >= state[x].
+
+The state 2 is used for objects that are reachable from a finalizer but
+that may be in the same strongly connected component than the finalizer.
+The state of these objects goes to 3 when we prove that they can be
+reached from a finalizer which is definitely not in the same strongly
+connected component. Finalizers on objects with state 3 must not be
+called.
+
+Let closure(x) be the list of objects reachable from x, including x
+itself. Pseudo-code (high-level) to get the list of marked objects::
+
+ marked = []
+ for x in objects_with_finalizers:
+ if state[x] != 0:
+ continue
+ marked.append(x)
+ for y in closure(x):
+ if state[y] == 0:
+ state[y] = 2
+ elif state[y] == 2:
+ state[y] = 3
+ for x in marked:
+ assert state[x] >= 2
+ if state[x] != 2:
+ marked.remove(x)
+
+This does the right thing independently on the order in which the
+objects_with_finalizers are enumerated. First assume that [x1, .., xn]
+are all in the same unreachable strongly connected component; no object
+with finalizer references this strongly connected component from
+outside. Then:
+
+* when x1 is processed, state[x1] == .. == state[xn] == 0 independently
+ of whatever else we did before. So x1 gets marked and we set
+ state[x1] = .. = state[xn] = 2.
+
+* when x2, ... xn are processed, their state is != 0 so we do nothing.
+
+* in the final loop, only x1 is marked and state[x1] == 2 so it stays
+ marked.
+
+Now, let's assume that x1 and x2 are not in the same strongly connected
+component and there is a reference path from x1 to x2. Then:
+
+* if x1 is enumerated before x2, then x2 is in closure(x1) and so its
+ state gets at least >= 2 when we process x1. When we process x2 later
+ we just skip it ("continue" line) and so it doesn't get marked.
+
+* if x2 is enumerated before x1, then when we process x2 we mark it and
+ set its state to >= 2 (before x2 is in closure(x2)), and then when we
+ process x1 we set state[x2] == 3. So in the final loop x2 gets
+ removed from the "marked" list.
+
+I think that it proves that the algorithm is doing what we want.
+
+The next step is to remove the use of closure() in the algorithm in such
+a way that the new algorithm has a reasonable performance -- linear in
+the number of objects whose state it manipulates::
+
+ marked = []
+ for x in objects_with_finalizers:
+ if state[x] != 0:
+ continue
+ marked.append(x)
+ recursing on the objects y starting from x:
+ if state[y] == 0:
+ state[y] = 1
+ follow y's children recursively
+ elif state[y] == 2:
+ state[y] = 3
+ follow y's children recursively
+ else:
+ don't need to recurse inside y
+ recursing on the objects y starting from x:
+ if state[y] == 1:
+ state[y] = 2
+ follow y's children recursively
+ else:
+ don't need to recurse inside y
+ for x in marked:
+ assert state[x] >= 2
+ if state[x] != 2:
+ marked.remove(x)
+
+In this algorithm we follow the children of each object at most 3 times,
+when the state of the object changes from 0 to 1 to 2 to 3. In a visit
+that doesn't change the state of an object, we don't follow its children
+recursively.
+
+In practice we can encode the 4 states with a single extra bit in the
+header:
+
+ ===== ============= ======== ====================
+ state is_forwarded? bit set? bit set in the copy?
+ ===== ============= ======== ====================
+ 0 no no n/a
+ 1 no yes n/a
+ 2 yes yes yes
+ 3 yes whatever no
+ ===== ============= ======== ====================
+
+So the loop above that does the transition from state 1 to state 2 is
+really just a copy(x) followed by scan_copied(). We must also clear the
+bit in the copy at the end, to clean up before the next collection
+(which means recursively bumping the state from 2 to 3 in the final
+loop).
More information about the Pypy-commit
mailing list