[Python-Dev] Design question: call __del__ for cyclical garbage?
Mon, 6 Mar 2000 04:09:45 -0500
> I'm beginning to believe that handing cycles with finalizers to the
> user is better than calling __del__ with a different meaning,
You won't be sorry: Python has the chance to be the first language that's
both useful and sane here!
> and I tentatively withdraw my proposal to change the rules for when
> __del__is called (even when __init__ fails; I haven't had any complaints
> about that either).
Well, everyone liked the parenthetical half of that proposal, although
Jack's example did point out a real surprise with it.
> There seem to be two competing suggestions for solutions: (1) call
> some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
> object; (2) Tim's proposal of an interface to ask the garbage
> collector for a trash cycle with a finalizer (or for an object with a
> finalizer in a trash cycle?).
Or a maximal strongly-connected component, or *something* -- unsure.
> Somehow Tim's version looks less helpful to me, because it *seems*
> that whoever gets to handle the cycle (the main code of the program?)
> isn't necessarily responsible for creating it (some library you didn't
> even know was used under the covers of some other library you called).
Yes, to me too. This is the Scheme "guardian" idea in a crippled form
(Scheme supports as many distinct guardians as the programmer cares to
create), and even in its full-blown form it supplies "a perfectly general
mechanism with no policy whatsoever".
Greg convinced me (although I haven't admitted this yet <wink>) that "no
policy whatsoever" is un-Pythonic too. *Some* policy is helpful, so I won't
be pushing the guardian idea any more (although see immediately below for an
immediate backstep on that <wink>).
> What keeps nagging me though is what to do when there's a finalizer
> but no cleanup method. I guess the trash cycle remains alive. Is
> this acceptable? (I guess so, because we've given the programmer a
> way to resolve the trash: provide a cleanup method.)
BDW considers it better to leak than to risk doing a wrong thing, and I
agree wholeheartedly with that. GC is one place you want to have a "100%
This is where something like a guardian can remain useful: while leaking is
OK because you've given them an easy & principled alternative, leaking
without giving them a clear way to *know* about it is not OK. If gc pushes
the leaked stuff off to the side, the gc module should (say) supply an entry
point that returns all the leaked stuff in a list. Then users can *know*
they're leaking, know how badly they're leaking, and examine exactly the
objects that are leaking. Then they've got the info they need to repair
their program (or at least track down the 3rd-party module that's leaking).
As with a guardian, they *could* also build a reclamation scheme on top of
it, but that would no longer be the main (or even an encouraged) thrust.
> If we detect individual cycles (the current algorithm doesn't do that
> yet, though it seems easy enough to do another scan), could we
> special-case cycles with only one finalizer and no cleaner-upper?
> (I'm tempted to call the finalizer because it seems little harm can be
> done -- but then of course there's the problem of the finalizer being
> called again when the refcount really goes to zero. :-( )
"Better safe than sorry" is my immediate view on this -- you can't know that
the finalizer won't resurrect the cycle, and "finalizer called iff refcount
hits 0" is a wonderfully simple & predictable rule. That's worth a lot to
preserve, unless & until it proves to be a disaster in practice.
As to the details of cleanup, I haven't succeeded in making the time to
understand all the proposals. But I've done my primary job here if I've
harassed everyone into not repeating the same mistakes all previous
languages have made <0.9 wink>.
> I wish we didn't have to special-case finalizers on class instances
> (since each dealloc function is potentially a combination of a
> finalizer and a deallocation routine), but the truth is that they
> *are* special -- __del__ has no responsibility for deallocating
> memory, only for deallocating external resources (such as temp files).
And the problem is that __del__ can do anything whatsoever than can be
expressed in Python, so there's not a chance in hell of outguessing it.
> Another practical consideration is that now there are cycles of the form
> <function object> <=> <module dict>
> which suggests that we should make function objects traceable. Also,
> modules can cross-reference, so module objects should be made
> traceable. I don't think that this will grow the sets of traced
> objects by too much (since the dicts involved are already traced, and
> a typical program has way fewer functions and modules than it has
> class instances). On the other hand, we may also have to trace
> (un)bound method objects, and these may be tricky because they are
> allocated and deallocated at high rates (once per typical method
This relates to what I was trying to get at with my response to your gc
implementation sketch: mark-&-sweep needs to chase *everything*, so the set
of chased types is maximal from the start. Adding chased types to the
"indirectly infer what's unreachable via accounting for internal refcounts
within the transitive closure" scheme can end up touching nearly as much as
a full M-&-S pass per invocation. I don't know where the break-even point
is, but the more stuff you chase in the latter scheme the less often you
want to run it.
About high rates, so long as a doubly-linked list allows efficient removal
of stuff that dies via refcount exhaustion, you won't actually *chase* many
bound method objects (i.e., they'll usually go away by themselves).
Note in passing that bound method objects often showed up in cycles in IDLE,
although you usually managed to break those in other ways.
> Back to the drawing board...
Good! That means you're making real progress <wink>.
glad-someone-is-ly y'rs - tim