[Numpy-discussion] in the NA discussion, what can we agree on?

T J tjhnson at gmail.com
Fri Nov 4 18:08:37 EDT 2011


On Fri, Nov 4, 2011 at 2:29 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Nov 4, 2011 at 1:22 PM, T J <tjhnson at gmail.com> wrote:
> > I agree that it would be ideal if the default were to skip IGNORED
> values,
> > but that behavior seems inconsistent with its propagation properties
> (such
> > as when adding arrays with IGNORED values).  To illustrate, when we did
> > "x+2", we were stating that:
> >
> > IGNORED(2) + 2 == IGNORED(4)
> >
> > which means that we propagated the IGNORED value.  If we were to skip
> them
> > by default, then we'd have:
> >
> > IGNORED(2) + 2 == 2
> >
> > To be consistent, then it seems we also should have had:
> >
> >>>> x + 2
> > [3, 2, 5]
> >
> > which I think we can agree is not so desirable.   What this seems to come
> > down to is that we tend to want different behavior when we are doing
> > reductions, and that for IGNORED data, we want it to propagate in every
> > situation except for a reduction (where we want to skip over it).
> >
> > I don't know if there is a well-defined way to distinguish reductions
> from
> > the other operations.  Would it hold for generalized ufuncs?  Would it
> hold
> > for other functions which might return arrays instead of scalars?
>
> Continuing my theme of looking for consensus first... there are
> obviously a ton of ugly corners in here. But my impression is that at
> least for some simple cases, it's clear what users want:
>
> >>> a = [1, IGNORED(2), 3]
> # array-with-ignored-values + unignored scalar only affects unignored
> values
> >>> a + 2
> [3, IGNORED(2), 5]
> # reduction operations skip ignored values
> >>> np.sum(a)
> 4
>
> For example, Gary mentioned the common idiom of wanting to take an
> array and subtract off its mean, and he wants to do that while leaving
> the masked-out/ignored values unchanged. As long as the above cases
> work the way I wrote, we will have
>
> >>> np.mean(a)
> 2
> >>> a -= np.mean(a)
> >>> a
> [-1, IGNORED(2), 1]
>
> Which I'm pretty sure is the result that he wants. (Gary, is that
> right?) Also numpy.ma follows these rules, so that's some additional
> evidence that they're reasonable. (And I think part of the confusion
> between Lluís and me was that these are the rules that I meant when I
> said "non-propagating", but he understood that to mean something
> else.)
>
> So before we start exploring the whole vast space of possible ways to
> handle masked-out data, does anyone see any reason to consider rules
> that don't have, as a subset, the ones above? Do other rules have any
> use cases or user demand? (I *love* playing with clever mathematics
> and making things consistent, but there's not much point unless the
> end result is something that people will use :-).)
>

I guess I'm just confused on how one, in principle, would distinguish the
various forms of propagation that you are suggesting (ie for reductions).
I also don't think it is good that we lack commutativity.  If we disallow
unignoring, then yes, I agree that what you wrote above is what people
want.  But if we are allowed to unignore, then I do not.

Also, how does something like this get handled?

>>> a = [1, 2, IGNORED(3), NaN]

If I were to say, "What is the mean of 'a'?", then I think most of the time
people would want 1.5.  I guess if we kept nanmean around, then we could do:

>>> a -= np.nanmean(a)
[-.5, .5, IGNORED(3), NaN]

Sorry if this is considered digging deeper than consensus.  I'm just
curious if arrays having NaNs in them, in addition to IGNORED, causes
problems.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111104/bec0e087/attachment.html>


More information about the NumPy-Discussion mailing list