[Numpy-discussion] missing data discussion round 2

Tue Jun 28 16:45:42 EDT 2011

All,
I'm not sure I understand some aspects of Mark's new proposal, sorry (blame the lack of sleep).
I'm pretty excited with the idea of built-in NA like np.dtype(NA['float64']), provided we can come with some shortcuts like np.nafloat64. I think that would really take care of the missing data part in a consistent and non-ambiguous way. 
However, I understand that if a choice would be made, this approach would be dropped for the most generic "mask way", right ? (By "mask way", I mean something that is close (but actually optimized) to thenumpy.ma approach).

So, taking this example
>>> np.add(a, b, out=b, mask=(a > threshold))
If 'b' doesn't already have a mask, masked values will be lost if we go the mask way ? But kept if we go the bit way ? I prefer the latter, then
Another advantage I see in the "bit-way' is that it's pretty close to the 'hardmask' idea. You'll never risk to lose the mask as it's already "burned" in the array...

And now for something not that completely different:
* Would it be possible to store internally the addresses of the NAs only to save some space (in the metadata ?) and when the .mask or .valid property is called, to still get a boolean array with the same shape as the underlying array ?