[Numpy-discussion] in the NA discussion, what can we agree on?

Fri Nov 4 02:47:53 EDT 2011

On Friday, November 4, 2011, Nathaniel Smith <njs at pobox.com> wrote:
> On Thu, Nov 3, 2011 at 7:54 PM, Gary Strangman
> <strang at nmr.mgh.harvard.edu> wrote:
>> For the non-destructive+propagating case, do I understand correctly that
>> this would mean I (as a user) could temporarily decide to IGNORE certain
>> portions of my data, perform a series of computation on that data, and
the
>> IGNORED flag (or however it is implemented) would be propagated from
>> computation to computation? If that's the case, I suspect I'd use it all
>> the time ... to effectively perform data subsetting without generating
>> (partial) copies of large datasets. But maybe I misunderstand the
>> intended notion of propagation ...
>
> I *think* it's more subtle than that, but I admit I'm somewhat
> confused about how exactly people would want IGNORED to work in
> various corner cases. (This is another part of why figuring out our
> audience/use-cases seems like an important first step to me...
> fortunately the semantics for MISSING are, I think, much more clear.)
>
> Say we have
>  >>> a = np.array([1, IGNORED(2), 3])
>  >>> b = np.array([10, 20, 30])
> (Here's I'm using IGNORED(2) to mean a value that is currently
> ignored, but if you unmasked it it would have the value 2.)
>
> Then we have:
>
> # non-propagating *or* propagating, doesn't matter:
>>>> a + 2
> [3, IGNORED(2), 5]
>
> # non-propagating:
>>>> a + b
> One of these, I don't know which:
>  [11, IGNORED(2), 33]  # numpy.ma chooses this
>  [11, 20, 33]
>  "Error: shape mismatch"
>
> (An error is maybe the most *consistent* option; the suggestion in the
> alterNEP was that masks had to match on all axes that were *not*
> broadcast, so a + 2 and a + a are okay, but a + b is an error. I
> assume the numpy.ma approach is also useful, but note that it has the
> surprising effect that addition is not commutative: IGNORED(x) +
> IGNORED(y) = IGNORED(x). Try it:
>   masked1 = np.ma.masked_array([1, 2, 3], mask=[False, True, False])
>   masked2 = np.ma.masked_array([10, 20, 30], mask=[False, True, False])
>   np.asarray(masked1 + masked2) # [11, 2, 33]
>   np.asarray(masked2 + masked1) # [11, 20, 33]
> I don't really know what people would prefer.)
>
> # propagating:
>>>> a + b
> One of these, I don't know which:
>  [11, IGNORED(2), 33] # same as numpy.ma, again
>  [11, IGNORED(22), 33]
>
> # non-propagating:
>>>> np.sum(a)
> 4
>
> # propagating:
>>>> np.sum(a)
> One of these, I don't know which:
>  IGNORED(4)
>  IGNORED(6)
>
> So from your description, I wouldn't say that you necessarily want
> non-destructive+propagating -- it really depends on exactly what
> computations you want to perform, and how you expect them to work. The
> main difference is how reduction operations are treated. I kind of
> feel like the non-propagating version makes more sense overall, but I
> don't know if there's any consensus on that.

I think this is further evidence for my idea that a mask should not be
undone, but is non destructive.  If you want to be able to access the
values after masking, have a view, or only apply the mask to a view.

Reduction ufuncs make a lot of sense because they have a basis in
mathematics when there are no values. Reduction ufuncs are covered in great
detail in Mark's NEP.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111104/db2b1b27/attachment.html>