[Python-Dev] Garbage collecting closures

Tim Peters tim_one@email.msn.com
Wed, 16 Apr 2003 00:51:27 -0400


This is a multi-part message in MIME format.

------=_NextPart_000_0006_01C303B2.4FA6CF60
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

[Guido]
> I'm glazing over the details now, but there seems to be a kernel of
> useful cleanup in here somehow; I hope that someone will be able to
> contribute a prototype of such code at least!

I'll attach a head start, a general implementation of Tarjan's SCC algorithm
that produces a list of SCCs already in a topsort order.  I haven't tested
this enough, and Tarjan's algorithm is subtle -- user beware.

The trygc() function at the end is an example application that appears to
work, busting all the objects gc knows about into SCCs and displaying them.
This requires Python CVS (for the new gc.get_referents function).  Note that
you'll get a very large SCC at the start.  This isn't an error!  Each module
that imports sys ends up in this SCC, due to that the module has the module
sys in its module dict, and sys has the module in its sys.modules dict.
>From there, modules have their top-level functions in their dict, while the
top level functions point back to the module dict via func_globals.  Etc.
Everything in this giant blob is reachable from everything else.

For the gc application, it would probably be better (run faster and consume
less memory) if dfs() simply ignored objects with no successors.
Correctness shouldn't be harmed if def started with

    succs = successors(v)
    if not succs:
        return

except that objects with no successors would no longer be considered
singleton SCCs, and the recursive call to dfs() would need to be fiddled to
skip trying to update id2lowest[v_id] then (so dfs should be changed to
return a bool saying whether it took the early return).  This would save the
current work of trying to chase pointless things like ints and strings.
Still, it's pretty zippy as-is!

------=_NextPart_000_0006_01C303B2.4FA6CF60
Content-Type: text/plain;
	name="scc.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="scc.py"

# This implements Tarjan's linear-time algorithm for finding the maximal
# strongly connected components.  It takes time proportional to the sum
# of the number of nodes and arcs.
#
# Two functions must be passed to the constructor:
#     node2id     graph node -> a unique integer
#     successors  graph node -> sequence of immediate successor graph =
nodes
#
# Call method getsccs() with an iterable producing the root nodes of the =
graph.
# The result is a list of SCCs, each of which is a list of graph nodes.
# This is a partitioning of all graph nodes reachable from the roots,
# where each SCC is a maximal subset such that each node in an SCC is
# reachable from all other nodes in the SCC.  Note that the derived =
graph
# where each SCC is a single "supernode" is necessarily acyclic (else if
# SCC1 and SCC2 were in a cycle, each node in SCC1 would be reachable =
from
# each node in SCC1 and SCC2, contradicting that SCC1 is a maximal =
subset).
# The list of SCCs returned by getsccs() is in a topological sort order =
wrt
# this derived DAG.

class SCC(object):
    def __init__(self, node2id, successors):
        self.node2id =3D node2id
        self.successors =3D successors

    def getsccs(self, roots):
        import sys

        node2id, successors =3D self.node2id, self.successors
        get_dfsnum =3D iter(xrange(sys.maxint)).next
        id2dfsnum =3D {}
        id2lowest =3D {}
        stack =3D []
        id2stacki =3D {}
        sccs =3D []

        def dfs(v, v_id):
            id2dfsnum[v_id] =3D id2lowest[v_id] =3D v_dfsnum =3D =
get_dfsnum()
            id2stacki[v_id] =3D len(stack)
            stack.append((v, v_id))
            for w in successors(v):
                w_id =3D node2id(w)
                if w_id not in id2dfsnum:   # first time we saw w
                    dfs(w, w_id)
                    id2lowest[v_id] =3D min(id2lowest[v_id], =
id2lowest[w_id])
                else:
                    w_dfsnum =3D id2dfsnum[w_id]
                    if w_dfsnum < v_dfsnum and w_id in id2stacki:
                        id2lowest[v_id] =3D min(id2lowest[v_id], =
w_dfsnum)

            if id2lowest[v_id] =3D=3D v_dfsnum:
                i =3D id2stacki[v_id]
                scc =3D []
                for w, w_id in stack[i:]:
                    del id2stacki[w_id]
                    scc.append(w)
                del stack[i:]
                sccs.append(scc)

        for v in roots:
            v_id =3D node2id(v)
            if v_id not in id2dfsnum:
                dfs(v, v_id)
        sccs.reverse()
        return sccs

_basic_tests =3D """
>>> succs =3D {1: [2], 2: []}
>>> s =3D SCC(int, lambda i: succs[i])

The order in which the roots are listed doesn't matter:  we get the =
unique
topsort regardless.

>>> s.getsccs([1])
[[1], [2]]
>>> s.getsccs([1, 2])
[[1], [2]]
>>> s.getsccs([2, 1])
[[1], [2]]

But note that 1 isn't reachable from 2, so giving 2 as the only root =
won't
find 1.

>>> s.getsccs([2])
[[2]]

>>> succs =3D {1: [2],
...          2: [3, 5],
...          3: [2, 4],
...          4: [3],
...          5: [2]}
>>> s =3D SCC(int, lambda i: succs[i])
>>> s.getsccs([1])
[[1], [2, 3, 4, 5]]
>>> s.getsccs(range(1, 6))
[[1], [2, 3, 4, 5]]

Break the link from 4 back to 2.
>>> succs[4] =3D []
>>> s.getsccs([1])
[[1], [2, 3, 5], [4]]
"""

__test__ =3D {'basic': _basic_tests}

def _test():
    import doctest
    doctest.testmod()

if __name__ =3D=3D '__main__':
    _test()

def trygc():
    import gc
    gc.collect()
    s =3D SCC(id, gc.get_referents)
    for scc in s.getsccs(gc.get_objects()):
        if len(scc) =3D=3D 1:
            continue
        print "SCC w/", len(scc), "objects"
        for x in scc:
            print "   ", hex(id(x)), type(x),
            if hasattr(x, "__name__"):
                print x.__name__,
            print

------=_NextPart_000_0006_01C303B2.4FA6CF60--