![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Sat, Jul 2, 2011 at 4:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
I'm not sure what you mean here. If we have masked array support at all (and some people seem to want it), then we have to say more than "it's an array with a mask". Indexing such a beast has to do *something*, so we need some kind of theory to say what, ufuncs have to do *something*, ditto. I mean, I guess we could just say that a masked array is literally an np.ndarray where you have attached a field named "mask" that doesn't do anything, but I don't think that would really satisfy most users :-).
Indexing a masked array just returns an array with np.NA in the appropriate elements. This is no different than with regular ndarray objects or in numpy.ma. As for ufuncs, the NEP already addresses this in multiple ways. For element-wise ufuncs, a "where" parameter is available for indicating which elements to skip. For reduction ufuncs, a "skipna" parameter will indicate whether or not to skip the values. On top of that, subclassed ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function that can set a default value for those parameters to make things easier for masked array users.
I don't know about others, but my main objection is this: He's proposing two different implementations for NA. I only need one, so having two is redundant and confusing. Of these two, the bit-pattern one has lower memory overhead (which many people have spoken up to say matters to them), and really obvious semantics (assignment is implemented as assignment, etc.). So why force people to make this confusing choice? What does the mask implementation add? AFAICT, its only purpose is to satisfy a rather different set of use cases. (See Gary Strangman's email here for a good description of these use cases: http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html) But AFAICT again, it's been crippled for those use cases in order to give it the NA semantics. So I just don't see who the masking part is supposed to help.
As a user of numpy.ma, masked arrays have always been a second-class citizen to me. Developing new code with it always brought about new surprises and discoveries of strange behavior from various functions. In this sense, numpy.ma has always been crippled. By sacrificing *some* of the existing semantics (which would likely be taken care of by a re-implemented numpy.ma to preserve backwards-compatibility), the masked array community gains a first-class citizen in numpy, and numpy developers will have the masked/missing data issue in the forefront whenever developing new functions and libraries. I am more than happy with that trade-off. I am willing to learn to semantics so long as I have a guarantee that the functions I use behaves the way I expect them to.
BTW, you can't access the memory of a masked value by taking a view, at least if I'm reading this version of the NEP correctly, and it seems to be the latest:
https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e... The only way to access the memory of a masked value is take a view *before* you mask it. And if the array has a mask at all when you take the view, you also have to set a.flags.ownmask = True, before you mask the value.
This isn't actually as bad as it sounds. From a function's perspective, it should only know the values that it has been given access to. If I -- as a user of said function -- decide that certain values should be unknown to the function, I wouldn't want the function to be able to override that decision. Remember, it is possible that the masked element never was initialized. Therefore, we wouldn't want the function to use that element. (Note, this is one of those "fun" surprises that a numpy.ma user sometimes encounters when a function uses np.asarray instead of np.asanyarray).
But as far as I understand this takes away the ability to temporarily fill in the masked values with values that are neutral for a calculation, e.g. zero when taking a sum or dot product. Instead it looks like a copy of the array has to be made in the new version. (I'm thinking more correlate, convolution, linalg, scipy.signal, not simple ufuncs. In many cases new arrays might be created anyway so the loss from getting a copy of the non-NA data might not be so severe.) I guess the "fun" surprises will remain fun since most function in scipy or other libraries won't suddenly learn how to handle masked arrays or NAs. What happens if you feed the new animals to linalg.svd, or linalg.inv or fft ... that are all designed for asarray and not for asanyarray? Josef
Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion