Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

July 2, 2011

      On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
...
I'm not sure what you mean here. If we have masked array support at
all (and some people seem to want it), then we have to say more than
"it's an array with a mask". Indexing such a beast has to do
*something*, so we need some kind of theory to say what, ufuncs have
to do *something*, ditto. I mean, I guess we could just say that a
masked array is literally an np.ndarray where you have attached a
field named "mask" that doesn't do anything, but I don't think that
would really satisfy most users :-).
Indexing a masked array just returns an array with np.NA in the appropriate
elements.  This is no different than with regular ndarray objects or in
numpy.ma.  As for ufuncs, the NEP already addresses this in multiple ways.
For element-wise ufuncs, a "where" parameter is available for indicating
which elements to skip.  For reduction ufuncs, a "skipna" parameter will
indicate whether or not to skip the values.  On top of that, subclassed
ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function
that can set a default value for those parameters to make things easier for
masked array users.

I don't know about others, but my main objection is this: He's
...
proposing two different implementations for NA. I only need one, so
having two is redundant and confusing. Of these two, the bit-pattern
one has lower memory overhead (which many people have spoken up to say
matters to them), and really obvious semantics (assignment is
implemented as assignment, etc.). So why force people to make this
confusing choice? What does the mask implementation add? AFAICT, its
only purpose is to satisfy a rather different set of use cases. (See
Gary Strangman's email here for a good description of these use cases:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
But AFAICT again, it's been crippled for those use cases in order to
give it the NA semantics. So I just don't see who the masking part is
supposed to help.
As a user of numpy.ma, masked arrays have always been a second-class citizen
to me. Developing new code with it always brought about new surprises and
discoveries of strange behavior from various functions. In this sense,
numpy.ma has always been crippled.  By sacrificing *some* of the existing
semantics (which would likely be taken care of by a re-implemented
numpy.mato preserve backwards-compatibility), the masked array
community gains a
first-class citizen in numpy, and numpy developers will have the
masked/missing data issue in the forefront whenever developing new functions
and libraries.  I am more than happy with that trade-off.  I am willing to
learn to semantics so long as I have a guarantee that the functions I use
behaves the way I expect them to.
...
BTW, you can't access the memory of a masked value by taking a view,
at least if I'm reading this version of the NEP correctly, and it
seems to be the latest:
https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e...
The only way to access the memory of a masked value is take a view
*before* you mask it. And if the array has a mask at all when you take
the view, you also have to set a.flags.ownmask = True, before you mask
the value.
This isn't actually as bad as it sounds.  From a function's perspective, it
should only know the values that it has been given access to.  If I -- as a
user of said function -- decide that certain values should be unknown to the
function, I wouldn't want the function to be able to override that
decision.  Remember, it is possible that the masked element never was
initialized.  Therefore, we wouldn't want the function to use that element.
(Note, this is one of those "fun" surprises that a numpy.ma user sometimes
encounters when a function uses np.asarray instead of np.asanyarray).

Ben Root

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

Benjamin Root