[Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)

Wed Jan 18 10:15:31 EST 2012

On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds
<malcolm.reynolds at gmail.com> wrote:
> Hi,
>
> I've built a system which allocates numpy arrays and processes them in
> C++ code (this is because I'm building a native code module using
> boost.python and it makes sense to use numpy data storage to then deal
> with outputs in python, without having to do any copying). Everything
> seems fine except when I parallelise the main loop, (openmp and TBB
> give the same results) in which case I see a whole bunch of messages
> saying
>
> "reference count error detected: an attempt was made to deallocate 12 (d)"
>
> sometimes during the running of the program, sometimes all at the end
> (presumably when all the destructors in my program run).
>
> To clarify, the loop I am now running parallel takes read-only
> parameters (enforced by the C++ compiler using 'const') and as far as
> I can tell there are no race conditions with multiple threads writing
> to the same numpy arrays at once or anything obvious like that.
>
> I recompiled numpy (I'm using 1.6.1 from the official git repository)
> to print out some extra information with the reference count message,
> namely a pointer to the thing which is being erroneously deallocated.
> Surprisingly, it is always the same address for any run of the
> program, considering this is a message printed out hundreds of times.
>
> I've looked into this a little with GDB and as far as I can see the
> object which the message pertains to is an "array descriptor", or at
> least that's what I conclude from backtraces similar to the following:
>
> Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
> 1501            fprintf(stderr, "*** Reference count error detected: \n" \
> (gdb) bt
> #0  arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
> #1  0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
> #2  0x0000000103e592d7 in
> boost::detail::sp_counted_impl_p<garf::multivariate_normal<double>
> const>::dispose (this=<value temporarily unavailable, due to
> optimizations>) at refcount.hpp:36
> #3 .... my code

I suspect there is some problem with the reference counting that you
are doing at the C++ level that is causing you to do too many
Py_DECREFs to the numpy objects, and this is being identified by the
arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level
implementation of dtype objects.) Reading the comments just before
descriptor.c:1501 points out that this warning is being printed
because something is trying to deallocate the builtin np.dtype('d') ==
np.dtype('float64') dtype. This should never happen. The refcount for
these objects should always be > 0 because numpy itself holds
references to them.

I suspect that you are obtaining the numpy object (1 Py_INCREF) before
you split into multiple threads but releasing them in each thread
(multiple Py_DECREFs). This is probably being hidden from you by the
boost.python interface and/or the boost::detail::sp_counted_impl_p<>
smart(ish) pointer. Check the backtrace where your code starts to
verify if this looks to be the case.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco