Mailman 3 Re: [Numpy-discussion] Indexing a masked array with another masked array leads to unexpected results - NumPy-Discussion

4 Nov 2011

      On Fri, Nov 4, 2011 at 5:26 AM, Pierre GM  wrote:
...
On Nov 03, 2011, at 23:07 , Joe Kington wrote:
...
I'm not sure if this is exactly a bug, per se, but it's a very confusing
consequence of the current design of masked arrays…
I would just add a "I think" between the "but" and "it's" before I could
agree.
...
Consider the following example:
import numpy as np
x = np.ma.masked_all(10, dtype=np.float32)
print x
x[x > 0] = 5
print x
The exact results will vary depending the contents of the empty memory
the array was initialized from.
Not a surprise. But isn't mentioned in the doc somewhere that using a
masked array as index is a very bad idea ? And that you should always fill
it before you use it as an array ? (Actually, using a MaskedArray as index
used to raise an IndexError. But I thought it was a bit too harsh, so I
dropped it).
Not that I can find in the docs (Perhaps I just missed it?). At any rate,
it's not mentioned in the numpy.ma section on indexing:
http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html#indexing-...

The only mention of it is a comment in MaskedArray.__setitem__ where the
IndexError is commented out.
...
ma.masked_all is an empty array with all its elements masked. Ie, you have
an uninitialized ndarray as data, and a bool array of the same size, full
of True. The operative word is here "uninitialized".
...
This wreaks havoc when filtering the contents of masked arrays (and
leads to hard-to-find bugs!).  The mask of the array in question is altered
at random (or, rather, based on the masked values as well as the masked
ones).
Once again, you're working on an *uninitialized* array. What you should
really do is to initialize it first, e.g. by 0, or whatever would make
sense in your field, and then work from that.
Sure, I shouldn't have used that as the example.

My point was that it's counter-intuitive that something like "x[x > 0] = 0"
alters the mask of x based on the values of _masked_ elements.  How it's
initialized is irrelevant (though, of course, it wouldn't be semi-random if
it were initialized in another way).
...
...
I can see the reasoning behind the way it works. It makes sense that "x
0" returns a masked boolean array with potentially several elements
masked, as well as the unmasked elements greater than 0.
Well, "x > 0" is also a masked array, with its mask full of True. Not very
usable by itself, and especially *not* for indexing.
...
...
However, wouldn't it make more sense to have MaskedArray.__setitem__
only operate on the unmasked elements of the "indx" passed in (at least in
the case where the assigned "value" isn't a masked array)?
Normally, that should be the case. But you're not working in "normal"
conditions, here. A bit like trying to boil water on a stove with a plastic
pan.
"x[x > threshold] = something" is a very common idiom for ndarrays.

I think most people would find it surprising that this operation doesn't
ignore the masked values.

I noticed this because one of my coworkers was complaining that a piece of
my code was "messing up" their masked arrays.  I'd never tested it with
masked arrays, but it took me ages to find, just because I wasn't looking
in places where I was just using common idioms.  In this particular case,
they'd initialized it with "masked_all", so it effectively altered the mask
of the array at random.  Regardless of how it was initialized, though, it
is surprising that the mask of "x" is changed based on masked values.

I just think it would be useful for it to be documented.

Cheers,

-Joe

Re: [Numpy-discussion] Indexing a masked array with another masked array leads to unexpected results

Joe Kington

tags

participants (1)