[Numpy-discussion] Masked Array for NumPy 1.7

Mark Wiebe mwwiebe at gmail.com
Sat May 19 11:21:32 EDT 2012


On Sat, May 19, 2012 at 10:00 AM, David Cournapeau <cournape at gmail.com>wrote:

> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant <travis at continuum.io>wrote:
>>
>>> Hey all,
>>>
>>> After reading all the discussion around masked arrays and getting input
>>> from as many people as possible, it is clear that there is still
>>> disagreement about what to do, but there have been some fruitful
>>> discussions that ensued.
>>>
>>> This isn't really new as there was significant disagreement about what
>>> to do when the masked array code was initially checked in to master.   So,
>>> in order to move forward, Mark and I are going to work together with
>>> whomever else is willing to help with an effort that is in the spirit of my
>>> third proposal but has a few adjustments.
>>>
>>> The idea will be fleshed out in more detail as it progresses, but the
>>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7
>>> and leave the actual ndarray object unchanged.   While the details need to
>>> be worked out here,  a goal is to have the C-API work with both ndmasked
>>> arrays and arrayobjects (possibly by defining a base-class C-level
>>> structure that both ndarrays inherit from).     This might also be a good
>>> way for Dag to experiment with his ideas as well but that is not an
>>> explicit goal.
>>>
>>> One way this could work, for example is to have PyArrayObject * be the
>>> base-class array (essentially the same C-structure we have now with a
>>> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject *
>>> as well but add more members to the C-structure.     I think this is the
>>> easiest thing to do and requires the least amount of code-change.      It
>>> is also possible to define an abstract base-class PyArrayObject * that both
>>> ndarray and ndmasked inherit from.     That way ndarray and ndmasked are
>>> siblings even though the ndarray would essentially *be* the PyArrayObject *
>>> --- just with a different type-hierarchy on the python side.
>>>
>>> This work will take some time and, therefore, I don't expect 1.7 to be
>>> released prior to SciPy Austin with an end of June target date.   The
>>> timing will largely depend on what time is available from people interested
>>> in resolving the situation.   Mark and I will have some availability for
>>> this work in June but not a great deal (about 2 man-weeks total between
>>> us).    If there are others who can step in and help, it will help
>>> accelerate the process.
>>>
>>>
>> This will be a difficult thing for others to help with since the concept
>> is vague, the design decisions seem to be in your and Mark's hands, and you
>> say you don't have much time. It looks to me like 1.7 will keep slipping
>> and I don't think that is a good thing. Why not go for option 2, which will
>> get 1.7 out there and push the new masked array work in to 1.8? Breaking
>> the flow of development and release has consequences, few of them good.
>>
>
> Agreed. 1.6.0 was released one year ago already, let's focus on polishing
> what's in there *now*. I have not followed closely what the decision was
> for a LTS release, but if 1.7 is supposed to be it, that's another argument
> about changing anything there for 1.7.
>

The motivation behind splitting the mask out into a separate ndmasked is
primarily so that pre-existing code will not silently function on NA-masked
arrays and produce incorrect results. This centres around using
PyArray_DATA to get at the data after manually checking flags, instead of
calling PyArray_FromAny. Maybe a reasonable solution is to tweak the
behavior of PyArray_DATA? It could work as follows:

- If an ndarray has no mask, PyArray_DATA returns the data pointer as it
does currently.
- If the ndarray has an NA-mask, PyArray_DATA sets an exception and returns
NULL
- Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which returns
the array data under all circumstances.

This way, code which currently uses the data pointer through PyArray_DATA
will fail instead of silently working with the wrong interpretation of the
data. What do people feel about this idea?

-Mark


> David
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120519/e059da83/attachment.html>


More information about the NumPy-Discussion mailing list