[Numpy-discussion] Missing data wrap-up and request for comments

Travis Oliphant travis at continuum.io
Fri May 11 01:14:22 EDT 2012

On May 10, 2012, at 12:21 AM, Charles R Harris wrote:

> On Wed, May 9, 2012 at 11:05 PM, Benjamin Root <ben.root at ou.edu> wrote:
> On Wednesday, May 9, 2012, Nathaniel Smith wrote:
> My only objection to this proposal is that committing to this approach
> seems premature. The existing masked array objects act quite
> differently from numpy.ma, so why do you believe that they're a good
> foundation for numpy.ma, and why will users want to switch to their
> semantics over numpy.ma's semantics? These aren't rhetorical
> questions, it seems like they must have concrete answers, but I don't
> know what they are.
> Based on the design decisions made in the original NEP, a re-made numpy.ma would have to lose _some_ features particularly, the ability to share masks. Save for that and some very obscure behaviors that are undocumented, it is possible to remake numpy.ma as a compatibility layer.
> That being said, I think that there are some fundamental questions that has concerned. If I recall, there were unresolved questions about behaviors surrounding assignments to elements of a view.
> I see the project as broken down like this:
> 1.) internal architecture (largely abi issues)
> 2.) external architecture (hooks throughout numpy to utilize the new features where possible such as where= argument)
> 3.) getter/setter semantics
> 4.) mathematical semantics
> At this moment, I think we have pieces of 2 and they are fairly non-controversial. It is 1 that I see as being the immediate hold-up here. 3 & 4 are non-trivial, but because they are mostly about interfaces, I think we can be willing to accept some very basic, fundamental, barebones components here in order to lay the groundwork for a more complete API later.
> To talk of Travis's proposal, doing nothing is no-go. Not moving forward would dishearten the community. Making a ndmasked type is very intriguing. I see it as a set towards eventually deprecating ndarray? Also, how would it behave with no.asarray() and no.asanyarray()? My other concern is a possible violation of DRY. How difficult would it be to maintain two ndarrays in parallel?  
> As for the flag approach, this still doesn't solve the problem of legacy code (or did I misunderstand?)
> My understanding of the flag is to allow the code to stay in and get reworked and experimented with while keeping it from contaminating conventional use.
> The whole point of putting the code in was to experiment and adjust. The rather bizarre idea that it needs to be perfect from the get go is disheartening, and is seldom how new things get developed. Sure, there is a plan up front, but there needs to be feedback and change. And in fact, I haven't seen much feedback about the actual code, I don't even know that the people complaining have tried using it to see where it hurts. I'd like that sort of feedback.

I don't think anyone is saying it needs to be perfect from the get go.    What I am saying is that this is fundamental enough to downstream users that this kind of thing is best done as a separate object.  The flag could still be used to make all Python-level array constructors build ndmasked objects.  

But, this doesn't address the C-level story where there is quite a bit of downstream use where people have used the NumPy array as just a pointer to memory without considering that there might be a mask attached that should be inspected as well. 

The NEP addresses this a little bit for those C or C++ consumers of the ndarray in C who always use PyArray_FromAny which can fail if the array has non-NULL mask contents.   However, it is *not* true that all downstream users use PyArray_FromAny. 

A large number of users just use something like PyArray_Check and then PyArray_DATA to get the pointer to the data buffer and then go from there thinking of their data as a strided memory chunk only (no extra mask).    The NEP fundamentally changes this simple invariant that has been in NumPy and Numeric before it for a long, long time. 

I really don't see how we can do this in a 1.7 release.    It has too many unknown and I think unknowable downstream effects.    But, I think we could introduce another arrayobject that is the masked_array with a Python-level flag that makes it the default array in Python. 

There are a few more subtleties,  PyArray_Check by default will pass sub-classes so if the new ndmask array were a sub-class then it would be passed (just like current numpy.ma arrays and matrices would pass that check today).    However, there is a PyArray_CheckExact macro which could be used to ensure the object was actually of PyArray_Type.   There is also the PyArg_ParseTuple command with "O!" that I have seen used many times to ensure an exact NumPy array.  


> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120511/438356d5/attachment.html>

More information about the NumPy-Discussion mailing list