[Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

Frédéric Bastien nouiz at nouiz.org
Fri Apr 20 10:55:48 EDT 2012


Hi,

I just discovered that the NA mask will modify the base ndarray
object. So I read about it to find the consequences on our c code. Up
to now I have fully read:

http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html

and partially read:

https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
https://github.com/njsmith/numpy/wiki/NA-discussion-status

In those documents, I see a problem with legacy code that will receive
an NA masked array as input. If I missed something, tell me.


All our c functions check their inputs array with PyArray_Check and
PyArray_ISALIGNED. If the NA mask array is set inside the ndarray c
object, our c functions who don't know about it and will treat those
inputs as not masked. So the user will have unexpected results. The
output will be an ndarray without mask but the code will have used the
masked value.

This will also happen with all other c code that use ndarray.

In our case, all the input check is done at the same place, so adding
the check with "PyArray_HasNASupport(PyArrayObject* obj)" to raise an
error will be easy for us. But I don't think this is the case for most
other c code.

So I would prefer a separate object to protect users from code not
being updated to reject NA masked inputs.

An alternative would be to have PyArray_Check return False for the NA
masked array, but I don't like that as this break the semantic that it
check for the class.

A last option I see would be to make the NPY_ARRAY_BEHAVED flag also
check that the array is not an NA marked array. I suppose many c code
do this check. But this is not a bullet proof check as not all code
(as ours) do not use it.


Also, I don't mind the added pointers to the structure as we use big arrays.

thanks

Frédéric



More information about the NumPy-Discussion mailing list