![](https://secure.gravatar.com/avatar/09939f25b639512a537ce2c90f77f958.jpg?s=120&d=mm&r=g)
On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
I'm not sure what you mean here. If we have masked array support at all (and some people seem to want it), then we have to say more than "it's an array with a mask". Indexing such a beast has to do *something*, so we need some kind of theory to say what, ufuncs have to do *something*, ditto. I mean, I guess we could just say that a masked array is literally an np.ndarray where you have attached a field named "mask" that doesn't do anything, but I don't think that would really satisfy most users :-).
Indexing a masked array just returns an array with np.NA in the appropriate elements. This is no different than with regular ndarray objects or in numpy.ma. As for ufuncs, the NEP already addresses this in multiple ways. For element-wise ufuncs, a "where" parameter is available for indicating which elements to skip. For reduction ufuncs, a "skipna" parameter will indicate whether or not to skip the values. On top of that, subclassed ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function that can set a default value for those parameters to make things easier for masked array users. I don't know about others, but my main objection is this: He's
proposing two different implementations for NA. I only need one, so having two is redundant and confusing. Of these two, the bit-pattern one has lower memory overhead (which many people have spoken up to say matters to them), and really obvious semantics (assignment is implemented as assignment, etc.). So why force people to make this confusing choice? What does the mask implementation add? AFAICT, its only purpose is to satisfy a rather different set of use cases. (See Gary Strangman's email here for a good description of these use cases: http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html) But AFAICT again, it's been crippled for those use cases in order to give it the NA semantics. So I just don't see who the masking part is supposed to help.
As a user of numpy.ma, masked arrays have always been a second-class citizen to me. Developing new code with it always brought about new surprises and discoveries of strange behavior from various functions. In this sense, numpy.ma has always been crippled. By sacrificing *some* of the existing semantics (which would likely be taken care of by a re-implemented numpy.mato preserve backwards-compatibility), the masked array community gains a first-class citizen in numpy, and numpy developers will have the masked/missing data issue in the forefront whenever developing new functions and libraries. I am more than happy with that trade-off. I am willing to learn to semantics so long as I have a guarantee that the functions I use behaves the way I expect them to.
BTW, you can't access the memory of a masked value by taking a view, at least if I'm reading this version of the NEP correctly, and it seems to be the latest:
https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e... The only way to access the memory of a masked value is take a view *before* you mask it. And if the array has a mask at all when you take the view, you also have to set a.flags.ownmask = True, before you mask the value.
This isn't actually as bad as it sounds. From a function's perspective, it should only know the values that it has been given access to. If I -- as a user of said function -- decide that certain values should be unknown to the function, I wouldn't want the function to be able to override that decision. Remember, it is possible that the masked element never was initialized. Therefore, we wouldn't want the function to use that element. (Note, this is one of those "fun" surprises that a numpy.ma user sometimes encounters when a function uses np.asarray instead of np.asanyarray). Ben Root