[Numpy-discussion] numpy threads crash when allocating arrays

Mon Jun 13 22:07:38 EDT 2016

Hi Burlen,

On Jun 13, 2016 5:24 PM, "Burlen Loring" <bloring at lbl.gov> wrote:
>
> Hi All,
>
> I'm working on a threaded pipeline where we want the end user to be able to code up Python functions to do numerical work. Threading is all done in C++11 and in each thread we've acquired gill before we invoke the user provided Python callback and release it only when the callback returns. We've used SWIG to expose bindings to C++ objects in Python.
>
> When run with more than 1 thread I get inconsistent segv's in various numpy routines, and occasionally see a *** Reference count error detected: an attempt was made to deallocate 11 (f) ***.
>
> To pass data from C++ to the Python callback as numpy array we have
>>
>> 206 // ****************************************************************************
>> 207 template <typename NT>
>> 208 PyArrayObject *new_object(teca_variant_array_impl<NT> *varrt)
>> 209 {
>> 210     // allocate a buffer
>> 211     npy_intp n_elem = varrt->size();
>> 212     size_t n_bytes = n_elem*sizeof(NT);
>> 213     NT *mem = static_cast<NT*>(malloc(n_bytes));
>> 214     if (!mem)
>> 215     {
>> 216         PyErr_Format(PyExc_RuntimeError,
>> 217             "failed to allocate %lu bytes", n_bytes);
>> 218         return nullptr;
>> 219     }
>> 220
>> 221     // copy the data
>> 222     memcpy(mem, varrt->get(), n_bytes);
>> 223
>> 224     // put the buffer in to a new numpy object
>> 225     PyArrayObject *arr = reinterpret_cast<PyArrayObject*>(
>> 226         PyArray_SimpleNewFromData(1, &n_elem, numpy_tt<NT>::code, mem));
>> 227     PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);
>> 228
>> 229     return arr;
>> 230 }

This code would probably be much simpler if you let numpy allocate the
buffer with PyArray_SimpleNew and then did the memcpy. I doubt that's
your problem, though. Numpy should be assuming that "owned" data was
allocated using malloc(), and if it were using a different allocator
then I think you'd be seeing crashes much sooner.

> This is the only place we create numpy objects in the C++ side.
>
> In my demo the Python callback is as follows:
>>
>>  33 def get_execute(rank, var_names):
>>  34     def execute(port, data_in, req):
>>  35         sys.stderr.write('descriptive_stats::execute MPI %d\n'%(rank))
>>  36
>>  37         mesh = as_teca_cartesian_mesh(data_in[0])
>>  38
>>  39         table = teca_table.New()
>>  40         table.copy_metadata(mesh)
>>  41
>>  42         table.declare_columns(['step','time'], ['ul','d'])
>>  43         table << mesh.get_time_step() << mesh.get_time()
>>  44
>>  45         for var_name in var_names:
>>  46
>>  47             table.declare_columns(['min '+var_name, 'avg '+var_name, \
>>  48                 'max '+var_name, 'std '+var_name, 'low_q '+var_name, \
>>  49                 'med '+var_name, 'up_q '+var_name], ['d']*7)
>>  50
>>  51             var = mesh.get_point_arrays().get(var_name).as_array()
>>  52
>>  53             table << float(np.min(var)) << float(np.average(var)) \
>>  54                 << float(np.max(var)) << float(np.std(var)) \
>>  55                 << map(float, np.percentile(var, [25.,50.,75.]))
>>  56
>>  57         return table
>>  58     return execute
>
> this callback is the only spot where numpy is used. the as_array call is implemented by new_object template above.
> Further, If I remove our use of PyArray_SimpleNewFromData, by replacing line 51 in the Python code above with var = np.array(range(1, 1100), 'f'), the problem disappears. It must have something to do with use of PyArray_SimpleNewFromData.
>
> I'm at a loss to see why things are going south. I'm using the GIL and I thought that would serialize the Python code. I suspect that numpy is using global or static variables some where internally and that it's inherently thread unsafe. Can anyone confirm/deny? maybe point me in the right direction?

Numpy does use global/static variables, and it is unsafe to call into
numpy simultaneously from different threads. But that's ok, because
you're not allowed to call numpy functions simultaneously from
different threads -- you have to hold the GIL first, and that
serializes access to all of numpy's internal state. Numpy is very
commonly used in threaded code and most people aren't seeing random
segfaults, so the problem is most likely in your code. Sorry I can't
help much more than that... I guess I'd start by triple-checking that
the code really truly does hold the GIL every time that it calls into
numpy/python APIs. I'd also try running it under valgrind in case it's
some other random memory corruption that's just showing up in a weird
way.

-n