[Python-Dev] Garbage collecting closures

Guido van Rossum guido@python.org
Tue, 15 Apr 2003 15:23:48 -0400


> [Guido]
> > ...
> > This makes me think that Python should run the garbage collector
> > before exiting, so that finalizers on objects that were previously
> > kept alive by cycles are called (even if finalizers on objects that
> > are *part* of a cycle still won't be called).

[Tim]
> What about finalizers on objects that are alive at exit because
> they're still reachable?  We seem to leave a lot of stuff alive at
> the end.  For example, here are the pymalloc stats at the end under
> current CVS, after opening an interactive shell then exiting
> immediately; this is produced at the end of Py_Finalize(), and only
> call_ll_exitfuncs() is done after this (and that probably shouldn't
> free anything):
> 
> Small block threshold = 256, in 32 size classes.
> 
> class   size   num pools   blocks in use  avail blocks
> -----   ----   ---------   -------------  ------------
>     2     24           1               1           168
>     5     48           1               2            82
>     6     56          13             170           766
>     7     64          13             445           374
>     8     72           5              25           255
>     9     80           1               1            49
>    15    128           1               2            29
>    20    168           5              25            95
>    23    192           1               1            20
>    25    208           1               2            17
>    29    240           1               2            14
>    31    256           1               1            14
> 
> # times object malloc called       =               17,119
> 3 arenas * 262144 bytes/arena      =              786,432
> 
> # bytes in allocated blocks        =               45,800
> # bytes in available blocks        =              131,072
> 145 unused pools * 4096 bytes      =              593,920
> # bytes lost to pool headers       =                1,408
> # bytes lost to quantization       =                1,944
> # bytes lost to arena alignment    =               12,288
> Total                              =              786,432
> 
> "size" here is 16 bytes larger than in a release build, because of
> the 8-byte padding added by PYMALLOC_DEBUG on each end of each block
> requested.  So, e.g., there's one (true size) 8-byte object still
> living at the end, and 445 48-byte objects.  Unreclaimed ints and
> floats aren't counted here (they've got their own free lists, and
> don't go thru pymalloc).
> 
> I don't know what all that stuff is, but I bet there are about 25
> dicts still alive at the end.

Close!  I moved the debugging code that can print the list of all
objects still alive at the end around so that it is now next to the
code that prints the above malloc stats.  (If you're following CVS
email you might have noticed this. :-)  The full output is way too
large to post; you can see for yourself by creating a debug build and
running this (on Unix; windows users use their imagination or upgrade
their OS):

  PYTHONDUMPREFS= ./python -S -c pass

When I run this, I see 23 dictionaries.  One is the dict of interned
strings that are still alive; the others are the tp_dicts of the
various built-in type objects.  Some interned strings appear to be
kept alive by various static globals holding names for faster name
lookup; there isn't much we can do about that.  I also don't think we
should bother un-initializing the built-in types.  Apart from that, I
don't think I see anything that looks suspect.  Of course, running a
larger program with the same setup might reveal real leaks.

> > I also think that if a strongly connected component (a stronger
> > concept than cycle) has exactly one object with a finalizer in it,
> > that finalizer should be called, and then the object should
> > somehow be marked as having been finalized (maybe a separate GC
> > queue could be used for this) in case it is resurrected by its
> > finalizer.
> 
> With the addition of gc.get_referents() in 2.3, it's easy to compute
> SCCs via Python code now; it's a PITA in C.  OTOH, figuring out
> which finalizers to call seems a PITA in Python:
> 
>     A<->F1 -> F2<->B
> 
> F1 and F2 have finalizers; A and B don't.  Python code can easily
> determine that there are 2 SCCs here, each with 1 finalizer (I
> suppose gc's has_finalizer() would need to be exposed, to determine
> whether __del__ exists correctly).  A tricky bit then is that
> running F1.__del__ may end up deleting F2 by magic (this is
> *possible* since F2 is reachable from F1, and F1.__del__ may break
> the link to F2), but it's hard for pure-Python code to know that.
> So that part seems easier done in C, and creating new gc lists in C
> is very easy thanks to the nice doubly-linked-list C API Neil coded
> in gcmodule.
> 
> Note a subtlety: the finalizers in SCCs should be run in a topsort
> ordering of the derived SCC graph (since F1.__del__ can ask F2 to do
> stuff, despite that F1 and F2 are in different SCCs, F1 should be
> finalized before F2).  Finding a topsort order is also easy in
> Python (and also a PITA in C).
> 
> So I picture computing a topsorted list of suitable objects (those
> that have a finalizer, and have the only finalizer in their SCC) in
> Python, and passing that on to a new gcmodule entry point.  The
> latter can link those objects into a doubly-linked C list in the
> same order, and then run finalizers "left to right".  It's a nice
> property of the gc lists that, e.g., if F1.__del__ does end up
> deleting F2, F2 simply vanishes from the list.
> 
> Another subtlety: suppose F1.__del__ resurrects F1, and doesn't
> delete F2.  Should F2.__del__ be called anyway?  Probably not, since
> if F1 is alive, everything reachable from it is also alive, and F1
> -> F2.  I've read that Java can get into a state where it's only
> able to reclaim 1 object per full gc collection due to headaches
> like this, despite that everything is trash.  There's really no way
> to tell whether F1.__del__ resurrects F1 short of starting gc over
> again (in particular, looking at F1's refcount before and after
> running F1.__del__ isn't reliable evidence for either conclusion,
> unless the "after" refcount is 0).

I'm glazing over the details now, but there seems to be a kernel of
useful cleanup in here somehow; I hope that someone will be able to
contribute a prototype of such code at least!

--Guido van Rossum (home page: http://www.python.org/~guido/)