[Numpy-discussion] Missing data wrap-up and request for comments

Travis Oliphant travis at continuum.io
Wed May 9 12:46:53 EDT 2012


Hey all, 

Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate.   I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner.   This is an exemplary collaboration and is at the core of why open source is valuable. 

The document is available here: 
   https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst

After reading that document, it appears to me that there are some fundamentally different views on how things should move forward.   I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations.    I'm not sure we can reach full consensus on this.     We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that).    

I would like one more discussion thread where the technical discussion can take place.    I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can.   I can't guarantee that I personally will succeed at that, but I can tell you that I will try.   That's all I'm asking of anyone else.    I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. 

We need concrete proposals and so I will start with three.   Please feel free to comment on these proposals or add your own during the discussion.    I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward.  If we don't have agreement, then I will move forward with what I think is the right approach.   I will either write the code myself or convince someone else to write it. 

In all cases, we have agreement that bit-pattern dtypes should be added to NumPy.      We should work on these (int32, float64, complex64, str, bool) to start.    So, the three proposals are independent of this way forward.   The proposals are all about the extra mask part:  

My three proposals: 

	* do nothing and leave things as is 

	* add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly)

	* move Mark's "masked ndarray objects" into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged.  The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked.    Ideally, numpy.ma would be changed to use ndmasked objects as their core. 

For the record, I'm currently in favor of the third proposal.   Feel free to comment on these proposals (or provide your own). 

Best regards,

-Travis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120509/ee5ba828/attachment.html>


More information about the NumPy-Discussion mailing list