[Numpy-discussion] How to debug reference counting errors

Ondřej Čertík ondrej.certik at gmail.com
Fri Aug 31 20:35:49 EDT 2012


Hi Dag,

On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
>> Hi,
>>
>> There is segfault reported here:
>>
>> http://projects.scipy.org/numpy/ticket/1588
>>
>> I've managed to isolate the problem and even provide a simple patch,
>> that fixes it here:
>>
>> https://github.com/numpy/numpy/issues/398
>>
>> however the patch simply doesn't decrease the proper reference, so it
>> might leak. I've used
>> bisection (took the whole evening unfortunately...) but the good news
>> is that I've isolated commits
>> that actually broke it. See the github issue #398 for details, diffs etc.
>>
>> Unfortunately, it's 12 commits from Mark and the individual commits
>> raise exception on the segfaulting code,
>> so I can't pin point the problem further.
>>
>> In general, how can I debug this sort of problem? I tried to use
>> valgrind, with a debugging build of numpy,
>> but it provides tons of false (?) positives: https://gist.github.com/3549063
>>
>> Mark, by looking at the changes that broke it, as well as at my "fix",
>> do you see where the problem could be?
>>
>> I suspect it is something with the changes in PyArray_FromAny() or
>> PyArray_FromArray() in ctors.c.
>> But I don't see anything so far that could cause it.
>>
>> Thanks for any help. This is one of the issues blocking the 1.7.0 release.
>
> IIRC you can recompile Python with some support for detecting memory
> leaks. One of the issues with using Valgrind, after suppressing the
> false positives, is that Python uses its own memory allocator so that
> sits between the bug and what Valgrind detects. So at least recompile
> Python to not do that.

Right. Compiling with "--without-pymalloc" (per README.valgrind as suggested
above by Richard) should improve things a lot. Thanks for the tip.

>
> As for hardening the NumPy source in general, you should at least be
> aware of these two options:
>
> 1) David Malcolm (dmalcolm at redhat.com) was writing a static code
> analysis plugin for gcc that would check every routine that the
> reference count semantics was correct. (I don't know how far he's got
> with that.)
>
> 2) In Cython we have a "reference count nanny". This requires changes to
> all the code though, so not an option just for finding this bug, just
> thought I'd mention it. In addition to the INCREF/DECREF you need to
> insert new "GIVEREF" and "GOTREF" calls (which are noops in a normal
> compile) to declare where you get and give away a reference. When
> Cython-generated sources are enabled with -DCYTHON_REFNANNY,
> INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a
> failure is raised if the function violates any contract.

I see. That's a nice option. For my own code, I never touch the
reference counting
by hand and rather just use Cython.


In the meantime, Mark fixed it:

https://github.com/numpy/numpy/pull/400
https://github.com/numpy/numpy/pull/405

Mark, thanks again for this. That saved me a lot of time.

Ondrej



More information about the NumPy-Discussion mailing list