I've built a system which allocates numpy arrays and processes them in
C++ code (this is because I'm building a native code module using
boost.python and it makes sense to use numpy data storage to then deal
with outputs in python, without having to do any copying). Everything
seems fine except when I parallelise the main loop, (openmp and TBB
give the same results) in which case I see a whole bunch of messages
"reference count error detected: an attempt was made to deallocate 12 (d)"
sometimes during the running of the program, sometimes all at the end
(presumably when all the destructors in my program run).
To clarify, the loop I am now running parallel takes read-only
parameters (enforced by the C++ compiler using 'const') and as far as
I can tell there are no race conditions with multiple threads writing
to the same numpy arrays at once or anything obvious like that.
I recompiled numpy (I'm using 1.6.1 from the official git repository)
to print out some extra information with the reference count message,
namely a pointer to the thing which is being erroneously deallocated.
Surprisingly, it is always the same address for any run of the
program, considering this is a message printed out hundreds of times.
I've looked into this a little with GDB and as far as I can see the
object which the message pertains to is an "array descriptor", or at
least that's what I conclude from backtraces similar to the following:
Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
1501 fprintf(stderr, "*** Reference count error detected: \n" \
#0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
#1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
#2 0x0000000103e592d7 in
const>::dispose (this=<value temporarily unavailable, due to
optimizations>) at refcount.hpp:36
#3 .... my code
Obviously I can turn off the parallelism to make this problem go away,
but since my underlying algorithm is trivially parallelisable I was
counting on being able to achieve linear speedup across cores..
Currently I can, and as far as I know there are no actual incorrect
results being produced by the program. However, in my field (Machine
Learning) it's difficult enough to know whether the numbers calculated
are sensible even without the presence of these kind of warnings, so
I'd like to get a handle on at least why this is happening so I'd know
know whether I can safely ignore it.
My guess at what might be happening is that the multiple threads are
dealing with some object concurrently and the updates to the reference
count are not processed atomically, meaning that there are too many
DECREFs which happen later on. I had presumed that allocated different
numpy matrices in different threads, and then all reading from central
numpy matrices would work fine, but apparently there is something I
missed, pertaining to descriptors..
Can anyone offer any guidance, or at least tell me this is safe to
ignore? I can reproduce the problem reliably, so if you need me to do
some digging with GDB at the point the error takes place I can do