[Numpy-discussion] Indexing a masked array with another masked array leads to unexpected results

Pierre GM pgmdevlist at gmail.com
Fri Nov 4 06:26:15 EDT 2011

On Nov 03, 2011, at 23:07 , Joe Kington wrote:

> I'm not sure if this is exactly a bug, per se, but it's a very confusing consequence of the current design of masked arrays…
I would just add a "I think" between the "but" and "it's" before I could agree.

> Consider the following example:
> import numpy as np
> x = np.ma.masked_all(10, dtype=np.float32)
> print x
> x[x > 0] = 5
> print x
> The exact results will vary depending the contents of the empty memory the array was initialized from.  

Not a surprise. But isn't mentioned in the doc somewhere that using a masked array as index is a very bad idea ? And that you should always fill it before you use it as an array ? (Actually, using a MaskedArray as index used to raise an IndexError. But I thought it was a bit too harsh, so I dropped it).
ma.masked_all is an empty array with all its elements masked. Ie, you have an uninitialized ndarray as data, and a bool array of the same size, full of True. The operative word is here "uninitialized".

> This wreaks havoc when filtering the contents of masked arrays (and leads to hard-to-find bugs!).  The mask of the array in question is altered at random (or, rather, based on the masked values as well as the masked ones).

Once again, you're working on an *uninitialized* array. What you should really do is to initialize it first, e.g. by 0, or whatever would make sense in your field, and then work from that.

> I can see the reasoning behind the way it works. It makes sense that "x > 0" returns a masked boolean array with potentially several elements masked, as well as the unmasked elements greater than 0.  

Well, "x > 0" is also a masked array, with its mask full of True. Not very usable by itself, and especially *not* for indexing. 

> However, wouldn't it make more sense to have MaskedArray.__setitem__ only operate on the unmasked elements of the "indx" passed in (at least in the case where the assigned "value" isn't a masked array)?

Normally, that should be the case. But you're not working in "normal" conditions, here. A bit like trying to boil water on a stove with a plastic pan.

More information about the NumPy-Discussion mailing list