[Numpy-discussion] Ticket review: #848, leak in PyArray_DescrFromType

Charles R Harris charlesr.harris at gmail.com
Fri Jul 18 14:03:57 EDT 2008


On Fri, Jul 18, 2008 at 5:15 AM, Michael Abbott <michael at araneidae.co.uk>
wrote:

> I'm afraid this is going to be a long one, and I don't see any good way to
> cut down the quoted text either...
>
> Charles, I'm going to plea with you to read what I've just written and
> think about it.  I'm trying to make the case as clear as I can.  I think
> the case is actually extremely simple: the existing @name at _arrtype_name
> code is broken.
>
> <snip>

Heh. As the macro is undefined after the code is generated, it should
probably be moved into the code.
I would actually like to get rid of the ifdef's (almost everywhere), but
that is a later stage of cleanup.


>
> 3.      Here's the reference count we're responsible for.


Yep.


>
> 4.      If obj is NULL we use the typecode
> 5.       otherwise we pass it to PyArray_FromAny.
> 6.      The first early return
> 7.      All paths (apart from 6) come together here.
>
> So at this point let's take stock.  typecode is in one of three states:
> NULL (path 2, or if creation failed), allocated with a single reference
> count (path 4), or lost (path 5).  This is not good.


It still has a single reference after 5 if PyArray_FromAny succeeded, that
reference is held by arr and transferred to robj. If the transfer fails, the
reference to arr is decremented and NULL returned by PyArray_Return. When
arr is garbage collected the reference to typecode will be decremented.


>
>
> LET ME EMPHASISE THIS: the state of the code at the finish label is
> dangerous and simply broken.
>
> The original state at the finish label is indeterminate: typecode has
> either been lost by passing it to PyArray_FromAny (in which case we're not
> allowed to touch it again), or else it has reference count that we're
> still responsible for.
>
> There seems to be a fantasy expressed in a comment in a recent update to
> this routine that PyArray_Scalar has stolen a reference, but fortunately a
> quick code inspection (of arrayobject.c) quickly refutes this frightening
> possibility.
>

No, no, Pyarray_Scalar doesn't do anything to the reference count. Where did
you see otherwise?


>
> So, the only way to fix the problem at (7) is to unify the two non-NULL
> cases.  One answer is to add a DECREF at (4), but we see at (11) that we
> still need typecode at (7) -- so the only solution is to add an extra
> ADDREF just before (5).  This then of course sadly means that we also need
> an extra DECREF at (6).
>
> PLEASE don't suggest moving the ADDREF until after (6) -- at this point
> typecode is lost and may have been destroyed, and relying on any
> possibility to the contrary is a recipe for continued screw ups.
>
> The rest is easy.  Once we've established the invariant that typecode is
> either NULL or has a single reference count at (7) then the two early
> returns at (8) and (9) unfortunately need to be augmented with DECREFs.
>
> And we're done.
>
>
> Responses to your original comments:
>
> > By the time we hit finish, robj is NULL or holds a reference to typecode
> > and the NULL case is taken care of up front.
>

> robj has nothing to do with the lifetime management of typecode, the only
> issue is the early return.  After the finish label typecode is either NULL
> (no problem) or else has a single reference count that needs to be
> accounted for.
>
> > Later on, the reference to typecode might be decremented,
> That *might* is at the heart of the problem.  You can't be so cavalier
> about managing references.
>
> > perhaps leaving robj crippled, but in that case robj itself is marked
> > for deletion upon exit.
> Please ignore robj in ths discussion, it's beside the point.
>
> > If the garbage collector can handle zero reference counts I think
> > we are alright.
> No, no, no.  This is nothing to do with the garbage collector.  If we
> screw up our reference counts here then the garbage collector isn't going
> to dig us out of the hole.


The garbage collector destroys the object and should decrement all the
references it holds. If that is not the case then there are bigger problems
afoot. The finalizer for the object should hold the knowledge of what needs
to be decremented.


>
>
> > I admit I haven't quite followed all the subroutines and
> > macros, which descend into the hazy depths without the slightest bit of
> > documentation, but at this point I'm inclined to leave things alone
> unless
> > you have a test that shows a leak from this source.
>

> Part of my point is that proper reference count discipline should not
> require any descent into subroutines (except for the very nasty case of
> reference theft, which I think is generally agreed to be a bad thing).
>

Agreed. But that is not the code we are looking at. My personal schedule for
this sort of cleanup/refactoring looks like this.

1) Format the code into readable C. (ongoing)
2) Document the functions so we know what they do.
3) Understand the code.
4) Fix up functions starting from the bottom layers.
5) Flatten the code -- the calls go too deep for my taste and make
understanding difficult. My attempts to   generate a call graph have all run
into problems.

At that point consider the reference counting model.

There aren't many people working with the C code apart from myself and
Travis (who wrote the original), so I want to encourage you to keep looking
at the code and working with it. However, we can't just start from scratch
and we have to be pretty conservative to keep from breaking things.


>
> As for the test case, try this one (you'll need a debug build):
>
> import numpy
> import sys
> refs = 0
> r = range(100)
> refs = sys.gettotalrefcount()
> for i in r: float32()
> print sys.gettotalrefcount() - refs
>

Simpler test case:

import sys, gc
import numpy as np

def main() :
    t = np.dtype(np.float32)
    print sys.getrefcount(t)
    for i in range(100) :
        np.float32()
        gc.collect()
    print sys.getrefcount(t)

if __name__ == "__main__" :
    main()

Result

$[charris at f8 ~]$ python debug.py
5
105

So there is a leak. The question is the proper fix. I want to take a closer
look at PyArray_Return and also float32() and relations.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080718/6eaf7e30/attachment.html>


More information about the NumPy-Discussion mailing list