[Numpy-discussion] Behavior of .reduceat()

Jaime Fernández del Río jaime.frio at gmail.com
Sun Mar 27 04:36:21 EDT 2016

Two of the oldest issues in the tracker (#834
<https://github.com/numpy/numpy/issues/834> and #835
<https://github.com/numpy/numpy/issues/835>) are about how .reduceat()
handles its indices parameter. I have been taking a look at the source
code, and it would be relatively easy to modify, the hardest part being to
figure out what the exact behavior should be.

Current behavior is that np.ufunc.reduceat(x, ind) returns
for i in range(len(ind))] with a couple of caveats:

   1. if ind[i] >= ind[i+1], then a[ind[i]] is returned, rather than a
   reduction over an empty slice.
   2. an index of len(ind) is appended to the indices argument, to be used
   as the endpoint of the last slice to reduce over.
   3. aside from this last case, the indices are required to be strictly
   inbounds, 0 <= index < len(x), or an error is raised

The proposed new behavior, with some optional behaviors, would be:

   1. if ind[i] >= ind[i+1], then a reduction over an empty slice, i.e. the
   ufunc identity, is returned. This includes raising an error if the ufunc
   does not have an identity, e.g. np.minimum.
   2. to fully support the "reduction over slices" idea, some form of out
   of bounds indices should be allowed. This could mean either that:
      1. only index = len(x) is allowed without raising an error, to allow
      computing the full reduction anywhere, not just as the last entry of the
      return, or
      2. allow any index in -len(x) <= index <= len(x), with the usual
      meaning given to negative values, or
      3. any index is allowed, with reduction results clipped to existing
      values (and the usual meaning for negative values).
   3. Regarding the appending of that last index of len(ind) to indices, we
      1. keep appending it, or
      2. never append it, since you can now request it without an error
      being raised, or
      3. only append it if the last index is smaller than len(x).

My thoughts on the options:

   - The minimal, more conservative approach would go with 2.1 and 3.1. And
   of course 1, if we don't implement that none of this makes sense.
   - I kind of think 2.2 or even 2.3 are a nice enhancement that shouldn't
   break too much stuff.
   - 3.2 I'm not sure about, probably hurts more than it helps at this
   point, although in a brand new design you probably would either not append
   the last index or also prepend a zero, as in np.split.
   - And 3.3 seems too magical, probably not a good idea, only listed it
   for completeness.

Any other thoughts or votes on what, if anything, should we implement, and
what the deprecation of current behavior should look like?


( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160327/dd7fd7bc/attachment.html>

More information about the NumPy-Discussion mailing list