[Numpy-discussion] missing data discussion round 2

Pierre GM pgmdevlist at gmail.com
Tue Jun 28 19:56:52 EDT 2011


On Jun 29, 2011, at 1:37 AM, Mark Wiebe wrote:

> On Tue, Jun 28, 2011 at 3:45 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> ...
>  
> I think that would really take care of the missing data part in a consistent and non-ambiguous way.
> However, I understand that if a choice would be made, this approach would be dropped for the most generic "mask way", right ? (By "mask way", I mean something that is close (but actually optimized) to thenumpy.ma approach).
> 
> The NEP proposes strict NA missing value semantics, where the only way to get at the masked values is by having another view that doesn't have the value masked. If someone has use cases where this prevents some functionality they need, I'd love to hear them. 

Mmh... Would you have an example ? I haven't caught up with my lack of sleep yet...


> 
> So, taking this example
> >>> np.add(a, b, out=b, mask=(a > threshold))
> If 'b' doesn't already have a mask, masked values will be lost if we go the mask way ? But kept if we go the bit way ? I prefer the latter, then
> Another advantage I see in the "bit-way' is that it's pretty close to the 'hardmask' idea. You'll never risk to lose the mask as it's already "burned" in the array...
> 
> I've nearly finished this parameter, and decided to call it 'where' instead, because it is operating like an SQL where clause. Here if neither a nor b are masked array it will only modify those values of b where the 'where' parameter has the value True.

OK, sounds fine. Pretty fine, actually. Just to be clear, if 'out' is not defined, the result is a masked array with 'where' as mask. What's the value below the mask ? np.NA ?

> And now for something not that completely different:
> * Would it be possible to store internally the addresses of the NAs only to save some space (in the metadata ?) and when the .mask or .valid property is called, to still get a boolean array with the same shape as the underlying array ?
> 
> Something like this could be possible, but would certainly complicate the implementation. If it were desired, it would be a follow-up feature.

Oh, no problem. I was suggesting a way to save some space, but if it's too tricky to implement, forget it.


More information about the NumPy-Discussion mailing list