[Numpy-discussion] Missing data wrap-up and request for comments

Charles R Harris charlesr.harris at gmail.com
Wed May 9 13:08:26 EDT 2012


On Wed, May 9, 2012 at 10:46 AM, Travis Oliphant <travis at continuum.io>wrote:

> Hey all,
>
> Nathaniel and Mark have worked very hard on a joint document to try and
> explain the current status of the missing-data debate.   I think they've
> done an amazing job at providing some context, articulating their views and
> suggesting ways forward in a mutually respectful manner.   This is an
> exemplary collaboration and is at the core of why open source is valuable.
>
> The document is available here:
>    https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst
>
> After reading that document, it appears to me that there are some
> fundamentally different views on how things should move forward.   I'm also
> reading the document incorporating my understanding of the history, of
> NumPy as well as all of the users I've met and interacted with which means
> I have my own perspective that is not necessarily incorporated into that
> document but informs my recommendations.    I'm not sure we can reach full
> consensus on this.     We are also well past time for moving forward with a
> resolution on this (perhaps we can all agree on that).
>
> I would like one more discussion thread where the technical discussion can
> take place.    I will make a plea that we keep this discussion as free from
> logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.
>   I can't guarantee that I personally will succeed at that, but I can tell
> you that I will try.   That's all I'm asking of anyone else.    I recognize
> that there are a lot of other issues at play here besides *just* the
> technical questions, but we are not going to resolve every community issue
> in this technical thread.
>
> We need concrete proposals and so I will start with three.   Please feel
> free to comment on these proposals or add your own during the discussion.
>  I will stop paying attention to this thread next Wednesday (May 16th) (or
> earlier if the thread dies) and hope that by that time we can agree on a
> way forward.  If we don't have agreement, then I will move forward with
> what I think is the right approach.   I will either write the code myself
> or convince someone else to write it.
>
> In all cases, we have agreement that bit-pattern dtypes should be added to
> NumPy.      We should work on these (int32, float64, complex64, str, bool)
> to start.    So, the three proposals are independent of this way forward.
> The proposals are all about the extra mask part:
>
> My three proposals:
>
> * do nothing and leave things as is
>
> * add a global flag that turns off masked array support by default but
> otherwise leaves things unchanged (I'm still unclear how this would work
> exactly)
>
> * move Mark's "masked ndarray objects" into a new fundamental type
> (ndmasked), leaving the actual ndarray type unchanged.  The array_interface
> keeps the masked array notions and the ufuncs keep the ability to handle
> arrays like ndmasked.    Ideally, numpy.ma would be changed to use
> ndmasked objects as their core.
>
>
The numpy.ma is unmaintained and I don't see that changing anytime soon. As
you know, I would prefer 1), but 2) is a good compromise and the infra
structure for such a flag could be useful for other things, although like
yourself I'm not sure how it would be implemented. I don't understand your
proposal for 3), but from the description I don't see that it buys anything.


> For the record, I'm currently in favor of the third proposal.   Feel free
> to comment on these proposals (or provide your own).
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120509/3d5103b7/attachment.html>


More information about the NumPy-Discussion mailing list