<div class="gmail_quote">On Sat, Jun 25, 2011 at 9:21 AM, Charles R Harris <span dir="ltr"><<a href="mailto:charlesr.harris@gmail.com">charlesr.harris@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="gmail_quote"><div class="im">On Sat, Jun 25, 2011 at 5:29 AM, Pierre GM <span dir="ltr"><<a href="mailto:pgmdevlist@gmail.com" target="_blank">pgmdevlist@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
This thread is getting quite long, innit ?<br>
And I think it's getting a tad confusing, because we're mixing two different concepts: missing values and masks.<br>
There should be support for missing values in numpy.core, I think we all agree on that.<br>
* What's been suggested of adding new dtypes (nafloat, naint) is great, by why not making it the default, then ? <br></blockquote><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
* Operations involving a NA (whatever the NA actually is, depending on the dtype of the input) should result in a NA (whatever the NA defined by the outputs dtype). That could be done by overloading the existing ufuncs to support the new dtypes.<br>
* There should be some simple methods to retrieve the location of those NAs in an array. Whether we just output the indices or a full boolean array (w/ True for a NA, False for a non-NA or vice-versa) needs to be decided.<br>
* We can always re-implement masked arrays to use these NAs in a way which would be consistent with <a href="http://numpy.ma" target="_blank">numpy.ma</a> (so as not to confuse existing users of <a href="http://numpy.ma" target="_blank">numpy.ma</a>): a mask would be a boolean array with the same shape than the underlying ndarray, with True for NA.<br>
Mark, I'd suggest you modify your proposal, making it clearer that it's not to add all of <a href="http://numpy.ma" target="_blank">numpy.ma</a> functionalities in the core, but just support these missing values. Using the term 'mask' should be avoided as much as possible, use a 'missing data' or whatever.<br>
</blockquote></div><div><br>I think he aims to support both. One complication with masks is keeping them tied to the data on disk. With na values one file can contain both the data and the missing data markers, whereas with masks, two files would be required. I don't think that will fly in the long run unless there is some standard file format, like geotiff for GIS, that combines both.<br>
</div></div></blockquote><div><br></div><div>Before I was leaning mostly towards masks, but now that I've come up with an NA bit pattern approach that feels reasonable, I think implementing both together is on the table.</div>
<div><br></div><div>Bringing up the file format issue is good, that hasn't been covered in the NEP yet.</div><div><br></div><div>-Mark</div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="gmail_quote"><div>
<br>Chuck <br></div></div>
<br>_______________________________________________<br>
NumPy-Discussion mailing list<br>
<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>
<br></blockquote></div><br>