[Numpy-discussion] Behavior of .reduceat()
Jaime Fernández del Río
jaime.frio at gmail.com
Sun Mar 27 04:36:21 EDT 2016
Two of the oldest issues in the tracker (#834
<https://github.com/numpy/numpy/issues/834> and #835
<https://github.com/numpy/numpy/issues/835>) are about how .reduceat()
handles its indices parameter. I have been taking a look at the source
code, and it would be relatively easy to modify, the hardest part being to
figure out what the exact behavior should be.
Current behavior is that np.ufunc.reduceat(x, ind) returns
[np.ufunc.reduce(a[ind[i]:ind[i+1]]
for i in range(len(ind))] with a couple of caveats:
1. if ind[i] >= ind[i+1], then a[ind[i]] is returned, rather than a
reduction over an empty slice.
2. an index of len(ind) is appended to the indices argument, to be used
as the endpoint of the last slice to reduce over.
3. aside from this last case, the indices are required to be strictly
inbounds, 0 <= index < len(x), or an error is raised
The proposed new behavior, with some optional behaviors, would be:
1. if ind[i] >= ind[i+1], then a reduction over an empty slice, i.e. the
ufunc identity, is returned. This includes raising an error if the ufunc
does not have an identity, e.g. np.minimum.
2. to fully support the "reduction over slices" idea, some form of out
of bounds indices should be allowed. This could mean either that:
1. only index = len(x) is allowed without raising an error, to allow
computing the full reduction anywhere, not just as the last entry of the
return, or
2. allow any index in -len(x) <= index <= len(x), with the usual
meaning given to negative values, or
3. any index is allowed, with reduction results clipped to existing
values (and the usual meaning for negative values).
3. Regarding the appending of that last index of len(ind) to indices, we
could:
1. keep appending it, or
2. never append it, since you can now request it without an error
being raised, or
3. only append it if the last index is smaller than len(x).
My thoughts on the options:
- The minimal, more conservative approach would go with 2.1 and 3.1. And
of course 1, if we don't implement that none of this makes sense.
- I kind of think 2.2 or even 2.3 are a nice enhancement that shouldn't
break too much stuff.
- 3.2 I'm not sure about, probably hurts more than it helps at this
point, although in a brand new design you probably would either not append
the last index or also prepend a zero, as in np.split.
- And 3.3 seems too magical, probably not a good idea, only listed it
for completeness.
Any other thoughts or votes on what, if anything, should we implement, and
what the deprecation of current behavior should look like?
Jaime
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160327/dd7fd7bc/attachment.html>
More information about the NumPy-Discussion
mailing list