Solution to finalisation problem [Re: fork()]

Sun Jun 13 22:32:18 EDT 1999

[Tim]
> the combo of finalizers, cycles and resurrection seems to be a
> philosophical mess even in languages that take it all <ahem> seriously.

[Greg Ewing]
> Indeed, which leads me to believe that the idea of giving objects
> a __del__ method is fundamentally flawed in the first place.

Except it works so nice in "the usual" case (no cycle, no resurrection):
everyone agrees on what should be done then, and it's remarkably
surprise-free!

Toss resurrection in, and it's still not *much* of a debate:  Python invokes
__del__ each time the refcount hits 0, Java does it only once, and the
choice seems arbitrary in the sense that there's no apparent killer argument
either way.

It's cycles that break its back, in Python and Java:  the language is then
in the business of choosing an order in a context where it can't possibly be
smart enough to choose wisely.  Java defines just enough to keep its own
internals sane, explicitly warning the programmer that it's not going to be
of much use to them.  Guido wisely picked reference-counting so Python
didn't have to disappoint users similarly <wink>.

Anyway, I'm not sure __del__ deserves the blame for not being able to handle
every insane complication a user can throw at it.  Like many another
feature, it's *fine* for what it's designed for, and simply can't be pushed
beyond that.

Hisao Suzuki earlier posted a link to an excellent paper describing proposed
"guardians" in Scheme.  Like many things Schemeish, it defines a perfectly
general mechanism and absolutely no policy.  Want to finalize cycles?  Fine,
here's a hammer, build it yourself.  Want to clean up I/O channels?  Fine,
be our guest.  20,000 nails, and one hammer to pound them all in one at a
time <wink>.

> Fortunately, there *is* a way to do finalisation that avoids all
> these problems. Instead of an object doing its own finalisation,
> it designates some *other* object to do its finalisation on its
> behalf after it is dead.
>
> The benefits of this are:
>
> (1) The object doing the finalisation is fully alive and operates
> in a predictable environment.
>
> (2) The object which triggered the finalisation is fully dead
> (its memory has already been reclaimed by the time the finaliser
> is called) and there is no possibility of it being resurrected.

This appears isomorphic to the "register-for-finalization" approach dissed
in the aforementioned Scheme paper.

I'll leave it to Schemers to argue the Scheme case, but at least in Python
#2 is a drawback to people who use resurrection on purpose, e.g. to avoid
high initialization costs for large objects connecting to the outside world.
A __del__ in that case can consist of just enough to close the connection,
then put the expensively constructed object on a global "free list" for
immediate reuse (one reason to prefer Python's treatment of resurrection
over Java's -- in this case you definitely *want* __del__ to get called each
time the object is about to die).

> In Python, it might look something like this from the
> user's point of view:
>
> class FileWrapper:
>    # Example of a class needing finalisation.
>    # Has an instance variable 'file' which needs
>    # to be closed.
>    def __init__(self, file):
>       self.file = file
>       register_finaliser(self, FileWrapperFinaliser(file))
>
> class FileWrapperFinaliser:
>    def __init__(self, file):
>       self.file = file
>    def __finalise__(self):
>       self.file.close()
>
> In this example, register_finaliser() is a new built-in
> method which stores the object and its finaliser in some
> special global dict. Whenever an object is reclaimed (either
> by refcount or M&S) the dict is checked for a finaliser for
> that object. If one is found, its __finalise__ method is
> called and it is removed from the dict.

Presumably this needs to be some form of weak dict, else the very act of
registering would render the object immortal ... OK, that's easily doable
under the covers.

> Note that the finaliser object shouldn't be given a reference
> to the original object (doing so would prevent it from ever
> becoming unreachable), but only enough information to enable
> the finalisation to be carried out.

In an object with mutable state this can be quite inconvenient, don't you
think?  Seems the "interesting state" would need to be factored out into
another container, shared by the object and its finalizer, referenced
indirectly by both.

> If a scheme like this were adopted, it should completely
> replace the existing __del__ method mechanism, so that there
> would be no difference between the finalisation of cyclic
> and non-cyclic trash. As such it would have to wait for
> Python 2.0.
>
> A-final-solution-to-the-finalisation-problem,
> Greg

Seems OK and doable so far as it goes.  Doubt people would like giving up
the easy convenience of __del__ in the vast bulk of cases where it's
non-problematic.  In a sense, Python (& Java) get in trouble now by trying
to provide policy as well as mechanism; I've become convinced that there is
no sensible policy in the presence of cycles, so punting on policy there is
fine; but don't want the language to make users "build it all themselves" in
the simpler cases where a reasonable policy is clear.

don't-throw-the-del-out-with-the-cycle-water-ly y'rs  - tim