[Numpy-discussion] Missing data wrap-up and request for comments

Mark Wiebe mwwiebe at gmail.com
Wed May 9 15:35:26 EDT 2012


On Wed, May 9, 2012 at 2:15 PM, Travis Oliphant <travis at continuum.io> wrote:

>
> On May 9, 2012, at 2:07 PM, Mark Wiebe wrote:
>
> On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant <travis at continuum.io>wrote:
>
>> Hey all,
>>
>> Nathaniel and Mark have worked very hard on a joint document to try and
>> explain the current status of the missing-data debate.   I think they've
>> done an amazing job at providing some context, articulating their views and
>> suggesting ways forward in a mutually respectful manner.   This is an
>> exemplary collaboration and is at the core of why open source is valuable.
>>
>> The document is available here:
>>    https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst
>>
>> After reading that document, it appears to me that there are some
>> fundamentally different views on how things should move forward.   I'm also
>> reading the document incorporating my understanding of the history, of
>> NumPy as well as all of the users I've met and interacted with which means
>> I have my own perspective that is not necessarily incorporated into that
>> document but informs my recommendations.    I'm not sure we can reach full
>> consensus on this.     We are also well past time for moving forward with a
>> resolution on this (perhaps we can all agree on that).
>>
>> I would like one more discussion thread where the technical discussion
>> can take place.    I will make a plea that we keep this discussion as free
>> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as
>> we can.   I can't guarantee that I personally will succeed at that, but I
>> can tell you that I will try.   That's all I'm asking of anyone else.    I
>> recognize that there are a lot of other issues at play here besides *just*
>> the technical questions, but we are not going to resolve every community
>> issue in this technical thread.
>>
>> We need concrete proposals and so I will start with three.   Please feel
>> free to comment on these proposals or add your own during the discussion.
>>  I will stop paying attention to this thread next Wednesday (May 16th) (or
>> earlier if the thread dies) and hope that by that time we can agree on a
>> way forward.  If we don't have agreement, then I will move forward with
>> what I think is the right approach.   I will either write the code myself
>> or convince someone else to write it.
>>
>> In all cases, we have agreement that bit-pattern dtypes should be added
>> to NumPy.      We should work on these (int32, float64, complex64, str,
>> bool) to start.    So, the three proposals are independent of this way
>> forward.   The proposals are all about the extra mask part:
>>
>> My three proposals:
>>
>> * do nothing and leave things as is
>>
>> * add a global flag that turns off masked array support by default but
>> otherwise leaves things unchanged (I'm still unclear how this would work
>> exactly)
>>
>> * move Mark's "masked ndarray objects" into a new fundamental type
>> (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
>> keeps the masked array notions and the ufuncs keep the ability to handle
>> arrays like ndmasked.    Ideally, numpy.ma would be changed to use
>> ndmasked objects as their core.
>>
>> For the record, I'm currently in favor of the third proposal.   Feel free
>> to comment on these proposals (or provide your own).
>>
>
> I'm most in favour of the second proposal. It won't take very much effort,
> and more clearly marks off this code as experimental than just
> documentation notes.
>
>
> Mark will you give more details about this proposal?    How would the flag
> work, what would it modify?
>

The idea is inspired in part by the Chrome release cycle, which has a
presentation here:

https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6ch&pli=1

Some quotes:

Features should be engineered so that they can be disabled easily (1 patch)

and

Would large feature development still be possible?

"Yes, engineers would have to work behind flags, however they can work for
as many releases as they need to and can remove the flag when they are
done."


The current numpy codebase isn't designed for this kind of workflow, but I
think we can productively emulate the idea for a big feature like NA
support.

One way to do this flag would be to have a "numpy.experimental" namespace
which is not imported by default. To enable the NA-mask feature, you could
do:

>>> import numpy.experimental.maskna

This would trigger an ExperimentalWarning to message that an experimental
feature has been enabled, and would add any NA-specific symbols to the
numpy namespace (NA, NAType, etc). Without this import, any operation which
would create an NA or NA-masked array raises an ExperimentalError instead
of succeeding. After this import, things would behave as they do now.

Cheers,
Mark

The proposal to create a ndmasked object that is separate from ndarray
> objects also won't take much effort and also marks off the object so those
> who want to use it can and those who don't are not pushed into using it
> anyway.
>
> -Travis
>
>
> Thanks,
> -Mark
>
>
>>
>> Best regards,
>>
>> -Travis
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120509/43369941/attachment.html>


More information about the NumPy-Discussion mailing list