[Python-Dev] More fun with Python shutdown

Tim Peters tim.one at comcast.net
Tue Nov 11 19:01:08 EST 2003


>     http://www.python.org/sf/839548

[Thomas Heller]
> Is the problem I currently have the same,

Probably not.

> I also use weakrefs (although Jim's patch doesn't seem to help)?

I guess your problem and Jim's both have in common that you and Zope3 use
assignment statements too <wink>.

> It is triggered when I have set the gc threshold to small values in a
> 2.3.2 debug build under Windows.  When some containers in my program
> are destroyed Python crashes with an access violation in
> _Py_ForgetReference() because op->_ob_next and
> _op->_ob_prev are both NULL:

That's a list of "all objects".  Deallocating an object removes it from that
list.  Trying to deallocate it a second time tries to remove it from the
list a second time, which barfs in just this way.

> PS: Here is the stack trace as displayed an MSVC6:
>
> _Py_ForgetReference(_object * 0x01101bd0) line 2001 + 15 bytes
> _Py_Dealloc(_object * 0x01101bd0) line 2021 + 9 bytes

...

> _Py_Dealloc(_object * 0x01101bd0) line 2022 + 7 bytes

Bingo:  _Py_Dealloc with the same object pointer appears twice in the
stack.

That's almost certainly a bug in Python, but is almost certainly unrelated
to the problem Jim is having.

I was able to make your test case substantially smaller.  The key is that
the "remove" callback trigger gc.  Apart from that, it doesn't matter at all
what "remove" does.  I don't know what the bug is, though, and since the
last of these consumed more than a day to track down and fix, I don't
anticipate having time to do that again:

"""
import weakref
import gc

_boundMethods = weakref.WeakKeyDictionary()

def safeRef(object):
    selfkey = object.im_self
    funckey = object.im_func
    _boundMethods[selfkey] = weakref.WeakKeyDictionary()
    _boundMethods[selfkey][funckey] = BoundMethodWeakref(object)

class BoundMethodWeakref:
    def __init__(self, boundMethod):
        def remove(object):
            gc.collect()
        self.weakSelf = weakref.ref(boundMethod.im_self, remove)

class X(object):
    def test(self):
        pass

def test():
    print "A"
    safeRef(X().test)
    print "B"

if __name__ == "__main__":
    test()
"""

As far as I can get without stopping:

It's dying when the anonymous bound method (X().test) is getting cleaned up.
That decrefs the anonymous X(), marking the end of its life too, which
triggers a weakref callback, which calls gc.collect() (in your original
program, a .keys() method created a list, which was enough to trigger gc
because you set the gc threshold to 1).  The anonymous X() then shows up in
gc's list of garbage, and the Py_DECREF in this part of gc:

			if ((clear = op->ob_type->tp_clear) != NULL) {
				Py_INCREF(op);
				clear(op);
				Py_DECREF(op);
			}

then knocks the refcount on the anonymous X() back to 0 a second time,
triggering the fatal attempt to deallocate an object that's already in the
process of being deallocated.

This *may* be a deep problem.  gc doesn't expect that the refcount on
anything it knows about is already 0 at the time gc gets started.  The way
Python works <wink>, anything whose refcount falls to 0 is recycled without
cyclic gc's help.  Nevertheless, the anonymous X() container *is* in gc's
lists when gc starts here, with a refcount of 0, and gc correctly concludes
that X() isn't reachable from "outside".  That's why it tries to delete X()
itself.

Anyway, the only thing weakrefs have to do with this is that they managed to
trigger gc between the time a gc-tracked container became dead and the time
the container untracked itself from gc.  I'll note that the anonymous bound
method object *did* untrack itself from gc before the fatal part began.

Hmm.  subtype_dealloc() *also* untracked the anonymous X() before the fatal
part began, but then it *re*tracked it:

	/* UnTrack and re-Track around the trashcan macro, alas */
	/* See explanation at end of function for full disclosure */
	PyObject_GC_UnTrack(self);
	++_PyTrash_delete_nesting;
	Py_TRASHCAN_SAFE_BEGIN(self);
	--_PyTrash_delete_nesting;
	_PyObject_GC_TRACK(self); /* We'll untrack for real later */

It's just a few lines later that the suicidal weakref callback gets
triggered.

The good news is that Guido must have spent days in all trying to
bulletproof subtype_dealloc(), so it's not like a bug in this part of the
code is a big surprise <wink>.  It's possible that temporarily incref'ing
self before the PyObject_ClearWeakRefs() call would be a correct fix (that
would prevent gc from believing the object is collectible, and offhand I
don't see anything other than PyObject_ClearWeakRefs here that could trigger
a round of gc).

If that's a correct analysis, this is a very serious bug:
double-deallocation will normally go undetected in a release build, and will
lead to memory corruption.  It will happen only when a weakref callback
happens to trigger gc, *and* the object being torn down at the time happens
to be in a generation gc collects at the time gc is triggered.  So the
conditions that trigger it are rare and unpredictable, and the effects of
the memory corruption it leads to are equally bad (anything can happen, at
any time later).




More information about the Python-Dev mailing list