[Numpy-discussion] in the NA discussion, what can we agree on?
Nathaniel Smith
njs at pobox.com
Fri Nov 4 02:07:48 EDT 2011
On Thu, Nov 3, 2011 at 7:54 PM, Gary Strangman
<strang at nmr.mgh.harvard.edu> wrote:
> For the non-destructive+propagating case, do I understand correctly that
> this would mean I (as a user) could temporarily decide to IGNORE certain
> portions of my data, perform a series of computation on that data, and the
> IGNORED flag (or however it is implemented) would be propagated from
> computation to computation? If that's the case, I suspect I'd use it all
> the time ... to effectively perform data subsetting without generating
> (partial) copies of large datasets. But maybe I misunderstand the
> intended notion of propagation ...
I *think* it's more subtle than that, but I admit I'm somewhat
confused about how exactly people would want IGNORED to work in
various corner cases. (This is another part of why figuring out our
audience/use-cases seems like an important first step to me...
fortunately the semantics for MISSING are, I think, much more clear.)
Say we have
>>> a = np.array([1, IGNORED(2), 3])
>>> b = np.array([10, 20, 30])
(Here's I'm using IGNORED(2) to mean a value that is currently
ignored, but if you unmasked it it would have the value 2.)
Then we have:
# non-propagating *or* propagating, doesn't matter:
>>> a + 2
[3, IGNORED(2), 5]
# non-propagating:
>>> a + b
One of these, I don't know which:
[11, IGNORED(2), 33] # numpy.ma chooses this
[11, 20, 33]
"Error: shape mismatch"
(An error is maybe the most *consistent* option; the suggestion in the
alterNEP was that masks had to match on all axes that were *not*
broadcast, so a + 2 and a + a are okay, but a + b is an error. I
assume the numpy.ma approach is also useful, but note that it has the
surprising effect that addition is not commutative: IGNORED(x) +
IGNORED(y) = IGNORED(x). Try it:
masked1 = np.ma.masked_array([1, 2, 3], mask=[False, True, False])
masked2 = np.ma.masked_array([10, 20, 30], mask=[False, True, False])
np.asarray(masked1 + masked2) # [11, 2, 33]
np.asarray(masked2 + masked1) # [11, 20, 33]
I don't really know what people would prefer.)
# propagating:
>>> a + b
One of these, I don't know which:
[11, IGNORED(2), 33] # same as numpy.ma, again
[11, IGNORED(22), 33]
# non-propagating:
>>> np.sum(a)
4
# propagating:
>>> np.sum(a)
One of these, I don't know which:
IGNORED(4)
IGNORED(6)
So from your description, I wouldn't say that you necessarily want
non-destructive+propagating -- it really depends on exactly what
computations you want to perform, and how you expect them to work. The
main difference is how reduction operations are treated. I kind of
feel like the non-propagating version makes more sense overall, but I
don't know if there's any consensus on that.
(You also have the option of just using the new where= argument to
your ufuncs, which avoids some of this confusion because it gives a
single mask that would apply to the whole operation. The ambiguities
here arise because it's not clear what to do when applying a binary
operation to two arrays that have different masks.)
Maybe you could give some examples of the kinds of computations you're
thinking of?
-- Nathaniel
More information about the NumPy-Discussion
mailing list