"Reference count error detected" bug appears with multithreading (OpenMP & TBB)
Hi, I've built a system which allocates numpy arrays and processes them in C++ code (this is because I'm building a native code module using boost.python and it makes sense to use numpy data storage to then deal with outputs in python, without having to do any copying). Everything seems fine except when I parallelise the main loop, (openmp and TBB give the same results) in which case I see a whole bunch of messages saying "reference count error detected: an attempt was made to deallocate 12 (d)" sometimes during the running of the program, sometimes all at the end (presumably when all the destructors in my program run). To clarify, the loop I am now running parallel takes read-only parameters (enforced by the C++ compiler using 'const') and as far as I can tell there are no race conditions with multiple threads writing to the same numpy arrays at once or anything obvious like that. I recompiled numpy (I'm using 1.6.1 from the official git repository) to print out some extra information with the reference count message, namely a pointer to the thing which is being erroneously deallocated. Surprisingly, it is always the same address for any run of the program, considering this is a message printed out hundreds of times. I've looked into this a little with GDB and as far as I can see the object which the message pertains to is an "array descriptor", or at least that's what I conclude from backtraces similar to the following: Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 1501 fprintf(stderr, "*** Reference count error detected: \n" \ (gdb) bt #0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 #1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 #2 0x0000000103e592d7 in boost::detail::sp_counted_impl_p<garf::multivariate_normal<double> const>::dispose (this=<value temporarily unavailable, due to optimizations>) at refcount.hpp:36 #3 .... my code Obviously I can turn off the parallelism to make this problem go away, but since my underlying algorithm is trivially parallelisable I was counting on being able to achieve linear speedup across cores.. Currently I can, and as far as I know there are no actual incorrect results being produced by the program. However, in my field (Machine Learning) it's difficult enough to know whether the numbers calculated are sensible even without the presence of these kind of warnings, so I'd like to get a handle on at least why this is happening so I'd know know whether I can safely ignore it. My guess at what might be happening is that the multiple threads are dealing with some object concurrently and the updates to the reference count are not processed atomically, meaning that there are too many DECREFs which happen later on. I had presumed that allocated different numpy matrices in different threads, and then all reading from central numpy matrices would work fine, but apparently there is something I missed, pertaining to descriptors.. Can anyone offer any guidance, or at least tell me this is safe to ignore? I can reproduce the problem reliably, so if you need me to do some digging with GDB at the point the error takes place I can do that. Many thanks, Malcolm
On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds <malcolm.reynolds@gmail.com> wrote:
Hi,
I've built a system which allocates numpy arrays and processes them in C++ code (this is because I'm building a native code module using boost.python and it makes sense to use numpy data storage to then deal with outputs in python, without having to do any copying). Everything seems fine except when I parallelise the main loop, (openmp and TBB give the same results) in which case I see a whole bunch of messages saying
"reference count error detected: an attempt was made to deallocate 12 (d)"
sometimes during the running of the program, sometimes all at the end (presumably when all the destructors in my program run).
To clarify, the loop I am now running parallel takes read-only parameters (enforced by the C++ compiler using 'const') and as far as I can tell there are no race conditions with multiple threads writing to the same numpy arrays at once or anything obvious like that.
I recompiled numpy (I'm using 1.6.1 from the official git repository) to print out some extra information with the reference count message, namely a pointer to the thing which is being erroneously deallocated. Surprisingly, it is always the same address for any run of the program, considering this is a message printed out hundreds of times.
I've looked into this a little with GDB and as far as I can see the object which the message pertains to is an "array descriptor", or at least that's what I conclude from backtraces similar to the following:
Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 1501 fprintf(stderr, "*** Reference count error detected: \n" \ (gdb) bt #0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 #1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 #2 0x0000000103e592d7 in boost::detail::sp_counted_impl_p<garf::multivariate_normal<double> const>::dispose (this=<value temporarily unavailable, due to optimizations>) at refcount.hpp:36 #3 .... my code
I suspect there is some problem with the reference counting that you are doing at the C++ level that is causing you to do too many Py_DECREFs to the numpy objects, and this is being identified by the arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level implementation of dtype objects.) Reading the comments just before descriptor.c:1501 points out that this warning is being printed because something is trying to deallocate the builtin np.dtype('d') == np.dtype('float64') dtype. This should never happen. The refcount for these objects should always be > 0 because numpy itself holds references to them. I suspect that you are obtaining the numpy object (1 Py_INCREF) before you split into multiple threads but releasing them in each thread (multiple Py_DECREFs). This is probably being hidden from you by the boost.python interface and/or the boost::detail::sp_counted_impl_p<> smart(ish) pointer. Check the backtrace where your code starts to verify if this looks to be the case. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
I suspect that you are obtaining the numpy object (1 Py_INCREF) before you split into multiple threads but releasing them in each thread (multiple Py_DECREFs). This is probably being hidden from you by the boost.python interface and/or the boost::detail::sp_counted_impl_p<> smart(ish) pointer. Check the backtrace where your code starts to verify if this looks to be the case.
Thankyou for your quick reply. This makes a lot of sense, I'm just having trouble seeing where this could be happening as everything I pass into each parallel computation strand is pass down as either pointer-to-consts or reference-to-const - the only things that need to be modified (for example random number generator objects) are created uniquely inside each iteration of the for loop so it can't be that. This information about which object has the reference count problem helps though, I will keep digging. I'm vaguely planning on trying to track every incref and decref so I can pin down which object has an unbalanced amount - to do this I want to know the address of the array, rather than the associated datatype descriptor - I assume I want to pay attention to the (self=0x117e0e850) in this line, and that is the address of the array I am mishandling? #1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 Malcolm
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jan 18, 2012 at 16:14, Malcolm Reynolds <malcolm.reynolds@gmail.com> wrote:
I suspect that you are obtaining the numpy object (1 Py_INCREF) before you split into multiple threads but releasing them in each thread (multiple Py_DECREFs). This is probably being hidden from you by the boost.python interface and/or the boost::detail::sp_counted_impl_p<> smart(ish) pointer. Check the backtrace where your code starts to verify if this looks to be the case.
Thankyou for your quick reply. This makes a lot of sense, I'm just having trouble seeing where this could be happening as everything I pass into each parallel computation strand is pass down as either pointer-to-consts or reference-to-const - the only things that need to be modified (for example random number generator objects) are created uniquely inside each iteration of the for loop so it can't be that.
My C++-fu is fairly weak, so I'm never really sure what the smart pointers are doing when. If there are tracing features that you can turn on, try that. Is this deallocation of the smart pointer to the "garf::multivariate_normal<double> const" being done inside the loop or outside back in the main thread? Where did it get created?
This information about which object has the reference count problem helps though, I will keep digging. I'm vaguely planning on trying to track every incref and decref so I can pin down which object has an unbalanced amount - to do this I want to know the address of the array, rather than the associated datatype descriptor - I assume I want to pay attention to the (self=0x117e0e850) in this line, and that is the address of the array I am mishandling?
#1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
Yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (2)
-
Malcolm Reynolds
-
Robert Kern