[Numpy-discussion] ufunc reduceat behavior on empty slices

Stephan Hoyer shoyer at gmail.com
Fri Jul 29 16:14:03 EDT 2016

Jaime brought up the same issue recently, along with some other issues for

I completely agree with both of you that the current behavior for empty
slices is very strange and should be changed to remove the special case.
Nathaniel Smith voiced the same opinion on the GitHub issue [1].

I think the path forward here (as Nathaniel writes) is pretty clear:
1. Start issuing a FutureWarning about a future change.
2. Fix this in a release or two.

[1] https://github.com/numpy/numpy/issues/834

On Fri, Jul 29, 2016 at 11:42 AM, Erik Brinkman <erik.brinkman at umich.edu>

> Hi,
> The behavior of a ufuncs reduceat on empty slices seems a little strange,
> and I wonder if there's a reason behind it / if there's a route to
> potentially changing it. First, I'll go into a little background.
> I've been making a lot of use of ufuncs reduceat functionality on
> staggered arrays. In general, I'll have "n" arrays, each with size "s[n]"
> and I'll store them in one array "x", such that "s.sum() == x.size".
> reduceat is great because I use
> ufunc.reduceat(x, np.insert(s[:-1].cumsum(), 0, 0))
> to get some summary information about each array. However, reduceat seems
> to behave strangely for empty slices. To make things concrete, let's assume:
> import numpy as np
> s = np.array([3, 0, 2])
> x = np.arange(s.sum())
> inds = np.insert(s[:-1].cumsum(), 0, 0)
> # [0, 3, 3]
> np.add.reduceat(x, inds)
> # [3, 3, 7] not [3, 0, 7]
> # This is distinct from
> np.fromiter(map(np.add.reduce, np.array_split(x, inds[1:])), x.dtype,
> s.size - 1)
> # [3, 0, 7] what I wanted
> The current documentation
> <http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduceat.html>
> on reduceat first states:
> For i in range(len(indices)), reduceat computes
> ufunc.reduce(a[indices[i]:indices[i+1]])
> That would suggest the outcome, that I expected. However, in the examples
> section it goes into a bunch of examples which contradict that statement
> and instead suggest that the actual algorithm is more akin to:
> ufunc.reduce(a[indices[i]:indices[i+1]]) if indices[i+1] > indices[i] else
> indices[i]
> Looking at the source
> <https://github.com/numpy/numpy/blob/2af06c804931aae4b30bb3349bc60271b0b65381/numpy/core/src/umath/ufunc_object.c#L3697>,
> it seems like it's copying indices[i], and then while there are more
> elements to process it keeps reducing, resulting in this unexpected
> behavior. It seems like the proper thing to do would be start with
> ufunc.identity, and then reduce. This is slightly less performant than
> what's implemented, but more "correct." There could, of course, just be a
> switch to copy the identity only when the slice is empty.
> Is there a reason it's implemented like this? Is it just for performance,
> or is this strange behavior *useful* somewhere? It seems like "fixing"
> this would be bad because you'll be changing somewhat documented
> functionality in a backwards incompatible way. What would the best approach
> to "fixing" this be? add another function "reduceat_"? add a flag to
> reduceat to do the proper thing for empty slices?
> Finally, is there a good way to work around this? I think for now I'm just
> going to mask out the empty slices and use insert to add them back in, but
> if I'm missing an obvious solution, I'll look at that too. I need to mask
> them out because, np.add.reduceat(x, 5) would ideally return 0, but
> instead it throws an error since 5 is out of range...
> Thanks for indulging my curiosity,
> Erik
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160729/75ccc0a8/attachment.html>

More information about the NumPy-Discussion mailing list