Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

July 2, 2011

      On Sat, Jul 2, 2011 at 4:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
...
On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith <njs@pobox.com> wrote:
...
I'm not sure what you mean here. If we have masked array support at
all (and some people seem to want it), then we have to say more than
"it's an array with a mask". Indexing such a beast has to do
*something*, so we need some kind of theory to say what, ufuncs have
to do *something*, ditto. I mean, I guess we could just say that a
masked array is literally an np.ndarray where you have attached a
field named "mask" that doesn't do anything, but I don't think that
would really satisfy most users :-).
Indexing a masked array just returns an array with np.NA in the appropriate
elements.  This is no different than with regular ndarray objects or in
numpy.ma.  As for ufuncs, the NEP already addresses this in multiple ways.
For element-wise ufuncs, a "where" parameter is available for indicating
which elements to skip.  For reduction ufuncs, a "skipna" parameter will
indicate whether or not to skip the values.  On top of that, subclassed
ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function
that can set a default value for those parameters to make things easier for
masked array users.
...
I don't know about others, but my main objection is this: He's
proposing two different implementations for NA. I only need one, so
having two is redundant and confusing. Of these two, the bit-pattern
one has lower memory overhead (which many people have spoken up to say
matters to them), and really obvious semantics (assignment is
implemented as assignment, etc.). So why force people to make this
confusing choice? What does the mask implementation add? AFAICT, its
only purpose is to satisfy a rather different set of use cases. (See
Gary Strangman's email here for a good description of these use cases:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
But AFAICT again, it's been crippled for those use cases in order to
give it the NA semantics. So I just don't see who the masking part is
supposed to help.
As a user of numpy.ma, masked arrays have always been a second-class citizen
to me. Developing new code with it always brought about new surprises and
discoveries of strange behavior from various functions. In this sense,
numpy.ma has always been crippled.  By sacrificing *some* of the existing
semantics (which would likely be taken care of by a re-implemented numpy.ma
to preserve backwards-compatibility), the masked array community gains a
first-class citizen in numpy, and numpy developers will have the
masked/missing data issue in the forefront whenever developing new functions
and libraries.  I am more than happy with that trade-off.  I am willing to
learn to semantics so long as I have a guarantee that the functions I use
behaves the way I expect them to.
...
BTW, you can't access the memory of a masked value by taking a view,
at least if I'm reading this version of the NEP correctly, and it
seems to be the latest:
 https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e...
The only way to access the memory of a masked value is take a view
*before* you mask it. And if the array has a mask at all when you take
the view, you also have to set a.flags.ownmask = True, before you mask
the value.
This isn't actually as bad as it sounds.  From a function's perspective, it
should only know the values that it has been given access to.  If I -- as a
user of said function -- decide that certain values should be unknown to the
function, I wouldn't want the function to be able to override that
decision.  Remember, it is possible that the masked element never was
initialized.  Therefore, we wouldn't want the function to use that element.
(Note, this is one of those "fun" surprises that a numpy.ma user sometimes
encounters when a function uses np.asarray instead of np.asanyarray).
But as far as I understand this takes away the ability to temporarily
fill in the masked values with values that are neutral for a
calculation, e.g. zero when taking a sum or dot product.
Instead it looks like a copy of the array has to be made in the new version.
(I'm thinking more correlate, convolution, linalg, scipy.signal, not
simple ufuncs. In many cases new arrays might be created anyway so the
loss from getting a copy of the non-NA data might not be so severe.)

I guess the "fun" surprises will remain fun since most function in
scipy or other libraries won't suddenly learn how to handle masked
arrays or NAs. What happens if you feed the new animals to linalg.svd,
or linalg.inv or fft ... that are all designed for asarray and not for
asanyarray?

Josef
...
Ben Root
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

josef.pktd＠gmail.com