[Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,,

Tim Peters tim_one@email.msn.com
Sun, 6 Apr 2003 20:47:53 -0400

[Jeremy Hylton]
> I think I'll second the thought that there are no satisfactory answers
> here.  We've made a big step forward by fixing the core dumps.
> If we want to document the current behavior, we would say that garbage
> collection may leave reachable objects in an "invalid state" in the
> presence of "problematic objects."  A "problematic object" is an
> instance of a classic class that defines a getattr hook (__getattr__)
> but not a finalizer (__del__).  An object an in "invalid state" has had
> its tp_clear slot executed; in the case of instances, this means the
> __dict__ will be empty.  Specifically, if a problematic object is part
> of unreachable cycle, the garbage collector will execute the code in its
> getattr hook; if executing that code makes any object in the cycle
> reachable again, it will be left in an invalid state.

I expect that documenting it comprehensbly is impossible.  For example, the
referrent of "it" in your last sentence is unclear, and hard to flesh out.
A problematic object doesn't need to be part of a cycle to cause problems,
and when it does cause problems the things that end up in an unexpected
state needn't be part of cycles either.  It's more that the problematic
object needs to be reachable only from an unreachable cycle (the unreachable
cycle needn't contain problematic objects), and then it's all the objects
reachable only from the unreachable cycle and from the problematic object
that may be in trouble (and regardless of whether they're in cycles).
Here's a concrete example, where the instance of the problematic D isn't in
a cycle, and neither are the list or the dict that get magically cleared
(.mylist and .mydict) despite being resurrected:

class C:

class D:
    def __init__(self):
        self.mydict = {'a': 1, 'b': 2}
        self.mylist = range(100)

    def __getattr__(self, attribute):
        global alist
        if attribute == "__del__":
        raise AttributeError

import gc

a = C()
a.loop = a  # make a cycle
a.d_instance = D()  # an instance of D hangs *off* the cycle

alist = []
del a
print gc.collect()  # 6: a, a.d_instance, their __dicts__, and D()'s
                    # mydict and mylist

print alist  # [(), []]

If we had enough words to explain that, it still wouldn't be enough, because
the effect of calling tp_clear isn't defined by the language for any type.
If, for example, D also defined a .mytuple attr and resurrected it in
__getattr__, the user would see that *that* one survived OK (tuples happen
to have a NULL tp_clear slot).

> If we document this for 2.2, it's more complicated because instances of
> new-style classes are also affected.  What's worse, a new-style class
> with a __getattribute__ hook is affected regardless of whether it has a
> finalizer.

In 2.2 but not 2.3, right?  I haven't tried anything with __getattribute__.
For that matter, in my own Python programming, I've never even defined a
__getattr__ method -- I spend most of my life tracking down bugs in things I
don't use <wink>.

> Here are a couple of thoughts about how to avoid leaving objects in an
> invalid state.

I'd much rather pursue that than write docs nobody will understand.

> It's pretty unlikely for it to happen, but speaking from
> experience <wink> it's baffling when it does.
> #1.  (I think this was Fred's suggestion on Friday.)  Don't do a
> hasattr() check on the object, do it on the class.  This is what happens
> with new-style classes in Python 2.3:  If a new-style class doesn't
> define an __del__ method, then its instances don't have finalizer.  It
> doesn't matter whether the specific instance has an __del__ attribute.
> Limitations: This is a change in semantics, although it only covers a
> nearly insane corner case.  The other limitation is that things could
> still go wrong, although only in the presence of a classic metaclass!

I'm not sure I followed the last sentence.  If I did, screw calling
hasattr() -- do a string lookup for "__del__" in the classic class's
__dict__, and that's it.  Anything that ends up executing arbitrary Python
code is going to leave holes.

> #2.  If an object has a getattr hook and it's involved in a cycle, just
> put it in gc.garbage.  Forget about checking for a finalizer.  That
> seems fine for 2.3, since we're only talking about classic classes with
> getattr hooks.  But it doesn't sound very pleasant for 2.2, since it
> covers an class instance with a getattr hook.

I'd like to avoid expanding the definition of what ends up in gc.garbage.
The relationship to __del__ and unreachable cycles is explainable now,
modulo the __getattr__ insanity.  Getting rid of the latter is a lot more
attractive than folding it into the former.

> I think #1 is pretty reasonable.  I'd like to see something fixed for
> 2.2.3, but I worry that the semantic change may be unacceptable for a
> bug fix release.  (But maybe not, the semantics are pretty insane right
> now :-).

I have no problem with changing this for 2.2.3.  I doubt any Python app will
be affected, except possibly to rid 1 in 10,000 of a subtle bug.  There's
certainly no defensible app that relied on Python segfaulting here<wink>,
and I can't imagine any relying on containers getting magically cleared at
unpredictable times.

BTW, I'm still wondering why the ZODB thread test failed the way it did for
Tres and Barry and me:  you saw corrupt gc lists, but the rest of us never
did.  We saw a Connection instance with a mysteriously cleared __dict__.
That's consistent with the __getattr__-hook-resurrects-an-
object-reachable-only-from-an-unreachable-cycle examples I posted, but did
you guys figure out on Friday whether that's what was actually happening?
The corrupt-gc-lists symptom was explained by the __getattr__ hook deleting
unreachable objects while gc was still crawling over them, and that's a
different (albeit related) problem than __dicts__ getting cleared by magic.