[Numpy-discussion] Missing data wrap-up and request for comments

Richard Hattersley rhattersley at gmail.com
Mon May 14 15:24:09 EDT 2012


For what it's worth, I'd prefer ndmasked.

As has been mentioned elsewhere, some algorithms can't really cope with
missing data. I'd very much rather they fail than silently give incorrect
results. Working in the climate prediction business (as with many other
domains I'm sure), even the *potential* for incorrect results can be
damaging.


On 11 May 2012 06:14, Travis Oliphant <travis at continuum.io> wrote:

>
> On May 10, 2012, at 12:21 AM, Charles R Harris wrote:
>
>
>
> On Wed, May 9, 2012 at 11:05 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>>
>>
>> On Wednesday, May 9, 2012, Nathaniel Smith wrote:
>>
>>>
>>>
>>> My only objection to this proposal is that committing to this approach
>>> seems premature. The existing masked array objects act quite
>>> differently from numpy.ma, so why do you believe that they're a good
>>> foundation for numpy.ma, and why will users want to switch to their
>>> semantics over numpy.ma's semantics? These aren't rhetorical
>>> questions, it seems like they must have concrete answers, but I don't
>>> know what they are.
>>>
>>
>> Based on the design decisions made in the original NEP, a re-made
>> numpy.ma would have to lose _some_ features particularly, the ability to
>> share masks. Save for that and some very obscure behaviors that are
>> undocumented, it is possible to remake numpy.ma as a compatibility layer.
>>
>> That being said, I think that there are some fundamental questions that
>> has concerned. If I recall, there were unresolved questions about behaviors
>> surrounding assignments to elements of a view.
>>
>> I see the project as broken down like this:
>> 1.) internal architecture (largely abi issues)
>> 2.) external architecture (hooks throughout numpy to utilize the new
>> features where possible such as where= argument)
>> 3.) getter/setter semantics
>> 4.) mathematical semantics
>>
>> At this moment, I think we have pieces of 2 and they are fairly
>> non-controversial. It is 1 that I see as being the immediate hold-up here.
>> 3 & 4 are non-trivial, but because they are mostly about interfaces, I
>> think we can be willing to accept some very basic, fundamental, barebones
>> components here in order to lay the groundwork for a more complete API
>> later.
>>
>> To talk of Travis's proposal, doing nothing is no-go. Not moving forward
>> would dishearten the community. Making a ndmasked type is very intriguing.
>> I see it as a set towards eventually deprecating ndarray? Also, how would
>> it behave with no.asarray() and no.asanyarray()? My other concern is a
>> possible violation of DRY. How difficult would it be to maintain two
>> ndarrays in parallel?
>>
>> As for the flag approach, this still doesn't solve the problem of legacy
>> code (or did I misunderstand?)
>>
>
> My understanding of the flag is to allow the code to stay in and get
> reworked and experimented with while keeping it from contaminating
> conventional use.
>
> The whole point of putting the code in was to experiment and adjust. The
> rather bizarre idea that it needs to be perfect from the get go is
> disheartening, and is seldom how new things get developed. Sure, there is a
> plan up front, but there needs to be feedback and change. And in fact, I
> haven't seen much feedback about the actual code, I don't even know that
> the people complaining have tried using it to see where it hurts. I'd like
> that sort of feedback.
>
>
> I don't think anyone is saying it needs to be perfect from the get go.
>  What I am saying is that this is fundamental enough to downstream users
> that this kind of thing is best done as a separate object.  The flag could
> still be used to make all Python-level array constructors build ndmasked
> objects.
>
> But, this doesn't address the C-level story where there is quite a bit of
> downstream use where people have used the NumPy array as just a pointer to
> memory without considering that there might be a mask attached that should
> be inspected as well.
>
> The NEP addresses this a little bit for those C or C++ consumers of the
> ndarray in C who always use PyArray_FromAny which can fail if the array has
> non-NULL mask contents.   However, it is *not* true that all downstream
> users use PyArray_FromAny.
>
> A large number of users just use something like PyArray_Check and then
> PyArray_DATA to get the pointer to the data buffer and then go from there
> thinking of their data as a strided memory chunk only (no extra mask).
>  The NEP fundamentally changes this simple invariant that has been in NumPy
> and Numeric before it for a long, long time.
>
> I really don't see how we can do this in a 1.7 release.    It has too many
> unknown and I think unknowable downstream effects.    But, I think we could
> introduce another arrayobject that is the masked_array with a Python-level
> flag that makes it the default array in Python.
>
> There are a few more subtleties,  PyArray_Check by default will pass
> sub-classes so if the new ndmask array were a sub-class then it would be
> passed (just like current numpy.ma arrays and matrices would pass that
> check today).    However, there is a PyArray_CheckExact macro which could
> be used to ensure the object was actually of PyArray_Type.   There is also
> the PyArg_ParseTuple command with "O!" that I have seen used many times to
> ensure an exact NumPy array.
>
> -Travis
>
>
>
>
>
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120514/dbc8dc79/attachment.html>


More information about the NumPy-Discussion mailing list