[Python-Dev] Problem with _PyTrash_destroy_chain ?

Thu Aug 30 19:38:00 CEST 2012

Hello,

On Thu, 30 Aug 2012 14:39:41 +0200
Manu <cupcicm at gmail.com> wrote:
> Hi,
> 
> I am currently hitting http://bugs.python.org/issue13992.
> 
> I have a scenario that reproduces the bug after 1 to 2 hours (intensive
> sqlalchemy and threading). I get the same stack trace as described in the
> bug.
> 
[...]
> 
> The thing is that this deallocator (from what I understood) is also
> bracketed with Py_TRASHCAN macros. It could potentially cause a long
> deallocation chain, that will be added to the _PyTrash_delete_later linked
> list (if it's bigger than the PyTrash_UNWIND_LEVEL). If that happens, it
> seems that the _PyTrash_delete_later list is going to contain twice the
> same object, which could in turn cause the double free ?

I don't see how that can happen. The following piece of logic in
_PyTrash_destroy_chain():

        PyObject *op = _PyTrash_delete_later;
        destructor dealloc = Py_TYPE(op)->tp_dealloc;

        _PyTrash_delete_later =
            (PyObject*) _Py_AS_GC(op)->gc.gc_prev;

ensures that the object is moved out of the list before it is
potentially re-added to it.

However, there's a potential pitfall producing double dealloc's
described in subtype_dealloc() in typeobject.c, under the following
comment:

       Q. Why the bizarre (net-zero) manipulation of
          _PyTrash_delete_nesting around the trashcan macros?

(I'm not copying the answer since it's quite long-winded, you can find
it here:
http://hg.python.org/cpython/file/2dde5a7439fd/Objects/typeobject.c#l1066
)

The bottom line is that subtype_dealloc() mutates
_PyTrash_delete_nesting to avoid being called a second time, but it
seems to miss the fact that another thread can run in-between and also
mutate _PyTrash_delete_nesting in the other direction. The GIL protects
us as long as it is not released, but it can be released inside the
functions called by a non-trivial deallocator such as subtype_dealloc().

This is only a hypothesis, but we see that this traceback involves
subtype_dealloc() and deallocators running from multiple threads:
http://bugs.python.org/file26717/thread_all_apply_bt.txt

Regards

Antoine.

-- 
Software development and contracting: http://pro.pitrou.net