[Numpy-discussion] Masked Array for NumPy 1.7

Charles R Harris charlesr.harris at gmail.com
Sat May 19 12:45:03 EDT 2012

On Sat, May 19, 2012 at 10:02 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

> On Sat, May 19, 2012 at 9:21 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>> On Sat, May 19, 2012 at 10:00 AM, David Cournapeau <cournape at gmail.com>wrote:
>>> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant <travis at continuum.io>wrote:
>>>>> Hey all,
>>>>> After reading all the discussion around masked arrays and getting
>>>>> input from as many people as possible, it is clear that there is still
>>>>> disagreement about what to do, but there have been some fruitful
>>>>> discussions that ensued.
>>>>> This isn't really new as there was significant disagreement about what
>>>>> to do when the masked array code was initially checked in to master.   So,
>>>>> in order to move forward, Mark and I are going to work together with
>>>>> whomever else is willing to help with an effort that is in the spirit of my
>>>>> third proposal but has a few adjustments.
>>>>> The idea will be fleshed out in more detail as it progresses, but the
>>>>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7
>>>>> and leave the actual ndarray object unchanged.   While the details need to
>>>>> be worked out here,  a goal is to have the C-API work with both ndmasked
>>>>> arrays and arrayobjects (possibly by defining a base-class C-level
>>>>> structure that both ndarrays inherit from).     This might also be a good
>>>>> way for Dag to experiment with his ideas as well but that is not an
>>>>> explicit goal.
>>>>> One way this could work, for example is to have PyArrayObject * be the
>>>>> base-class array (essentially the same C-structure we have now with a
>>>>> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject *
>>>>> as well but add more members to the C-structure.     I think this is the
>>>>> easiest thing to do and requires the least amount of code-change.      It
>>>>> is also possible to define an abstract base-class PyArrayObject * that both
>>>>> ndarray and ndmasked inherit from.     That way ndarray and ndmasked are
>>>>> siblings even though the ndarray would essentially *be* the PyArrayObject *
>>>>> --- just with a different type-hierarchy on the python side.
>>>>> This work will take some time and, therefore, I don't expect 1.7 to be
>>>>> released prior to SciPy Austin with an end of June target date.   The
>>>>> timing will largely depend on what time is available from people interested
>>>>> in resolving the situation.   Mark and I will have some availability for
>>>>> this work in June but not a great deal (about 2 man-weeks total between
>>>>> us).    If there are others who can step in and help, it will help
>>>>> accelerate the process.
>>>> This will be a difficult thing for others to help with since the
>>>> concept is vague, the design decisions seem to be in your and Mark's hands,
>>>> and you say you don't have much time. It looks to me like 1.7 will keep
>>>> slipping and I don't think that is a good thing. Why not go for option 2,
>>>> which will get 1.7 out there and push the new masked array work in to 1.8?
>>>> Breaking the flow of development and release has consequences, few of them
>>>> good.
>>> Agreed. 1.6.0 was released one year ago already, let's focus on
>>> polishing what's in there *now*. I have not followed closely what the
>>> decision was for a LTS release, but if 1.7 is supposed to be it, that's
>>> another argument about changing anything there for 1.7.
>> The motivation behind splitting the mask out into a separate ndmasked is
>> primarily so that pre-existing code will not silently function on NA-masked
>> arrays and produce incorrect results. This centres around using
>> PyArray_DATA to get at the data after manually checking flags, instead of
>> calling PyArray_FromAny. Maybe a reasonable solution is to tweak the
>> behavior of PyArray_DATA? It could work as follows:
>> - If an ndarray has no mask, PyArray_DATA returns the data pointer as it
>> does currently.
>> - If the ndarray has an NA-mask, PyArray_DATA sets an exception and
>> returns NULL
>> - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which
>> returns the array data under all circumstances.
>> This way, code which currently uses the data pointer through PyArray_DATA
>> will fail instead of silently working with the wrong interpretation of the
>> data. What do people feel about this idea?
> Code working with the wrong interpretation of the data doesn't bother me
> much at this point in development. Long term it matters, but in the short
> term we can't expect code not explicitly written to work with masked arrays
> to do the right thing. I think we are looking at a period of several years
> before things settle out and get accepted. First, the implementation and
> its interface needs to get close to final form, and then the long slow
> process of adoption into things like matplotlib needs to take place. I'd
> quess three to five years for that process.
> That said, my main concern is to move forward and not spend the next year
> waiting. I see splitting the masked code out as rather like the python
> types having pointers to sequence/numerical/etc methods, i.e., ndarray then
> looks something like an abstract class. I don't have a problem with that
> and it does avoid base object bloat. As to having PyArray_DATA fail for
> masked arrays and provide new functions for unrestricted access, I'd be
> tempted to have PyArray_DATA continue to behave as it does and let the new
> functions return the error for masked arrays. Making third party
> applications fail for masked arrays is going make masked arrays very
> unpopular. Most likely no one would use them and third party applications
> would feel no pressure to support them. Another possibility might be to
> have a compile flag that determines whether of not PyArray_Data returns an
> error for masked arrays, something like we do now for deprecating old
> macros.
My own plan for the near term would be as follows:

1) Put in the experimental option and get the 1.7 release out. This gets us
through the next couple of months and keeps things moving.

2) Look at what hooks/low level functions would let us reimplement np.ma.
Because there are so many different mask uses out there, this would be a
good way to discover what low level support is likely to provide a good
basis for others to build on.

3) Revisit the idea of making all ndarrays masked by default, but do so
with the experience and feedback from current mask users.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120519/8da60d06/attachment.html>

More information about the NumPy-Discussion mailing list