[Numpy-discussion] Masked Array for NumPy 1.7

Travis Oliphant travis at continuum.io
Sun May 20 01:02:25 EDT 2012


On May 19, 2012, at 10:21 AM, Mark Wiebe wrote:

> On Sat, May 19, 2012 at 10:00 AM, David Cournapeau <cournape at gmail.com> wrote:
> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris <charlesr.harris at gmail.com> wrote:
> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant <travis at continuum.io> wrote:
> Hey all,
> 
> After reading all the discussion around masked arrays and getting input from as many people as possible, it is clear that there is still disagreement about what to do, but there have been some fruitful discussions that ensued.
> 
> This isn't really new as there was significant disagreement about what to do when the masked array code was initially checked in to master.   So, in order to move forward, Mark and I are going to work together with whomever else is willing to help with an effort that is in the spirit of my third proposal but has a few adjustments.
> 
> The idea will be fleshed out in more detail as it progresses, but the basic concept is to create an (experimental) ndmasked object in NumPy 1.7 and leave the actual ndarray object unchanged.   While the details need to be worked out here,  a goal is to have the C-API work with both ndmasked arrays and arrayobjects (possibly by defining a base-class C-level structure that both ndarrays inherit from).     This might also be a good way for Dag to experiment with his ideas as well but that is not an explicit goal.
> 
> One way this could work, for example is to have PyArrayObject * be the base-class array (essentially the same C-structure we have now with a HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * as well but add more members to the C-structure.     I think this is the easiest thing to do and requires the least amount of code-change.      It is also possible to define an abstract base-class PyArrayObject * that both ndarray and ndmasked inherit from.     That way ndarray and ndmasked are siblings even though the ndarray would essentially *be* the PyArrayObject * --- just with a different type-hierarchy on the python side.
> 
> This work will take some time and, therefore, I don't expect 1.7 to be released prior to SciPy Austin with an end of June target date.   The timing will largely depend on what time is available from people interested in resolving the situation.   Mark and I will have some availability for this work in June but not a great deal (about 2 man-weeks total between us).    If there are others who can step in and help, it will help accelerate the process.
> 
> 
> This will be a difficult thing for others to help with since the concept is vague, the design decisions seem to be in your and Mark's hands, and you say you don't have much time. It looks to me like 1.7 will keep slipping and I don't think that is a good thing. Why not go for option 2, which will get 1.7 out there and push the new masked array work in to 1.8? Breaking the flow of development and release has consequences, few of them good.
> 
> Agreed. 1.6.0 was released one year ago already, let's focus on polishing what's in there *now*. I have not followed closely what the decision was for a LTS release, but if 1.7 is supposed to be it, that's another argument about changing anything there for 1.7.
> 
> The motivation behind splitting the mask out into a separate ndmasked is primarily so that pre-existing code will not silently function on NA-masked arrays and produce incorrect results. This centres around using PyArray_DATA to get at the data after manually checking flags, instead of calling PyArray_FromAny. Maybe a reasonable solution is to tweak the behavior of PyArray_DATA? It could work as follows:
> 
> - If an ndarray has no mask, PyArray_DATA returns the data pointer as it does currently.
> - If the ndarray has an NA-mask, PyArray_DATA sets an exception and returns NULL
> - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which returns the array data under all circumstances.
> 
> This way, code which currently uses the data pointer through PyArray_DATA will fail instead of silently working with the wrong interpretation of the data. What do people feel about this idea?

The problem with this is that PyArray_DATA calls typically don't do error checking as the API could not fail before.   I could see introducing an API that did fail and then encouraging use of this API.

Ultimately, the motivation to split the mask out is because the idea of *all* arrays being masked arrays at their core is a very new idea for NumPy and one that has downstream consequences and needs to be phased in more slowly.    I'd rather not be stuck supporting the changes to PyArrayObject that it implies when it is not clear how masked arrays should really be handled or if it's appropriate that *all* NumPy arrays should be secretly masked arrays. 

Another thing I've been wondering... I presume that any accessors to the masked fields in the current NumPy code base are going through function calls.    Does this mean that we could in 1.8 re-purpose those fields for some other feature and still have code compiled against 1.7 work with 1.8? 

If those calls are inlined in an extension module that uses the NumPy C-API, doesn't this mean that it's effectively the same (from an ABI perspective) of a macro access?    If so, then I don't  see how inlined function-calls are any better from an ABI perspective than a macro access. The only benefit seems to be the tendency to have fewer pre-processor inspired bugs.   But, ultimately, it seems a point of style rather than function. 

-Travis


> 
> -Mark
>  
> David
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120520/18578f92/attachment.html>


More information about the NumPy-Discussion mailing list