[Numpy-discussion] Indexing a masked array with another masked array leads to unexpected results

Thu Nov 3 18:07:07 EDT 2011

Forgive me if this is already a well-know oddity of masked arrays. I hadn't
seen it before, though.

I'm not sure if this is exactly a bug, per se, but it's a very confusing
consequence of the current design of masked arrays...

Consider the following example:

import numpy as np

x = np.ma.masked_all(10, dtype=np.float32)
print x
x[x > 0] = 5
print x

The exact results will vary depending the contents of the empty memory the
array was initialized from.

This wreaks havoc when filtering the contents of masked arrays (and leads
to hard-to-find bugs!).  The mask of the array in question is altered at
random (or, rather, based on the masked values as well as the masked ones).

Of course, once you're aware of this, there are a number of workarounds
(namely, filling the array or explicitly operating on "x.data" instead of
x).

I can see the reasoning behind the way it works. It makes sense that "x >
0" returns a masked boolean array with potentially several elements masked,
as well as the unmasked elements greater than 0.

However, wouldn't it make more sense to have MaskedArray.__setitem__ only
operate on the unmasked elements of the "indx" passed in (at least in the
case where the assigned "value" isn't a masked array)?

Cheers,
-Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111103/89ba7d6c/attachment.html>