[Numpy-discussion] How to debug reference counting errors

Mark Wiebe mwwiebe at gmail.com
Fri Aug 31 20:56:34 EDT 2012


On Fri, Aug 31, 2012 at 5:35 PM, Ondřej Čertík <ondrej.certik at gmail.com>wrote:

> Hi Dag,
>
> On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
> > On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
> >> Hi,
> >>
> >> There is segfault reported here:
> >>
> >> http://projects.scipy.org/numpy/ticket/1588
> >>
> >> I've managed to isolate the problem and even provide a simple patch,
> >> that fixes it here:
> >>
> >> https://github.com/numpy/numpy/issues/398
> >>
> >> however the patch simply doesn't decrease the proper reference, so it
> >> might leak. I've used
> >> bisection (took the whole evening unfortunately...) but the good news
> >> is that I've isolated commits
> >> that actually broke it. See the github issue #398 for details, diffs
> etc.
> >>
> >> Unfortunately, it's 12 commits from Mark and the individual commits
> >> raise exception on the segfaulting code,
> >> so I can't pin point the problem further.
> >>
> >> In general, how can I debug this sort of problem? I tried to use
> >> valgrind, with a debugging build of numpy,
> >> but it provides tons of false (?) positives:
> https://gist.github.com/3549063
> >>
> >> Mark, by looking at the changes that broke it, as well as at my "fix",
> >> do you see where the problem could be?
> >>
> >> I suspect it is something with the changes in PyArray_FromAny() or
> >> PyArray_FromArray() in ctors.c.
> >> But I don't see anything so far that could cause it.
> >>
> >> Thanks for any help. This is one of the issues blocking the 1.7.0
> release.
> >
> > IIRC you can recompile Python with some support for detecting memory
> > leaks. One of the issues with using Valgrind, after suppressing the
> > false positives, is that Python uses its own memory allocator so that
> > sits between the bug and what Valgrind detects. So at least recompile
> > Python to not do that.
>
> Right. Compiling with "--without-pymalloc" (per README.valgrind as
> suggested
> above by Richard) should improve things a lot. Thanks for the tip.
>
> >
> > As for hardening the NumPy source in general, you should at least be
> > aware of these two options:
> >
> > 1) David Malcolm (dmalcolm at redhat.com) was writing a static code
> > analysis plugin for gcc that would check every routine that the
> > reference count semantics was correct. (I don't know how far he's got
> > with that.)
> >
> > 2) In Cython we have a "reference count nanny". This requires changes to
> > all the code though, so not an option just for finding this bug, just
> > thought I'd mention it. In addition to the INCREF/DECREF you need to
> > insert new "GIVEREF" and "GOTREF" calls (which are noops in a normal
> > compile) to declare where you get and give away a reference. When
> > Cython-generated sources are enabled with -DCYTHON_REFNANNY,
> > INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a
> > failure is raised if the function violates any contract.
>
> I see. That's a nice option. For my own code, I never touch the
> reference counting
> by hand and rather just use Cython.
>
>
> In the meantime, Mark fixed it:
>
> https://github.com/numpy/numpy/pull/400
> https://github.com/numpy/numpy/pull/405
>
> Mark, thanks again for this. That saved me a lot of time.
>

No problem. The way I prefer to deal with this kind of error is use C++
smart pointers. C++11's unique_ptr and boost's intrusive_ptr are both
useful for painlessly managing this kind of reference counting headache.

-Mark

>
> Ondrej
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120831/ade00d81/attachment.html>


More information about the NumPy-Discussion mailing list