Fixing definition of reduceat for Numpy 2.0?
![](https://secure.gravatar.com/avatar/83eb4f5a772b1dfa1cac59251cb68384.jpg?s=120&d=mm&r=g)
Hi folks, I don't follow numpy development in much detail these days but I see that there is a 2.0 release planned soon. Would this be an opportunity to change the behaviour of 'reduceat'? This issue has been open in some form since 2006! https://github.com/numpy/numpy/issues/834 The current behaviour was originally inherited from Numeric, and makes reduceat often unusable in practice, even where it should be the perfect, concise, efficient solution. But it has been impossible to change it without breaking compatibіlity with existing code. As a result, horrible hacks are needed instead, e.g. my answer here: https://stackoverflow.com/questions/57694003 Is this something that could finally be fixed in 2.0? Martin
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Fri, Dec 22, 2023 at 12:34 PM Martin Ling <martin-numpy@earth.li> wrote:
The reduceat API is certainly problematic, but I don't think fixing it is really a NumPy 2.0 thing. As discussed in that issue, the right way to fix that is to add a new API with the correct behavior, and then we can think about deprecating (and maybe eventually removing) the current reduceat method. If the new reducebins() method were available, I would say removing reduceat() would be appropriate to consider for NumPy 2, but we don't have the new method with fixed behavior yet, which is the bigger blocker.
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi Martin, I agree it is a long-standing issue, and I was reminded of it by your comment. I have a draft PR at https://github.com/numpy/numpy/pull/25476 that does not change the old behaviour, but allows you to pass in a start-stop array which behaves more sensibly (exact API TBD). Please have a look! Marten Martin Ling <martin-numpy@earth.li> writes:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Fri, 2023-12-22 at 18:01 -0500, Marten van Kerkwijk wrote:
That looks nice, I don't have a clear feeling on the order of items, if we think of it in terms of `(start, stop)` there was also the idea voiced to simply add another name in which case you would allow start and stop to be separate arrays. Of course if go with your `slice(start, stop)` idea that also works, although passing as separate parameters seems nice too. Adding another name (if we can think of one at least) seems pretty good to me, since I suspect we would add docs to suggest not using `reduceat`. One small thing about the PR: I would like to distinct `default` and `initial`. I.e. the default value is used only for empty reductions, while the initial value should be always used (unless you would pass both, which we don't for normal reductions though). I suppose the machinery isn't quite set up to do both side-by-side. - Sebastian
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi Sebastian,
Yes, one could add another method. Or perhaps even add a new argument to `.reduce` instead (say `slices`). But this seemed the simplest route...
If we'd want to, even with the present PR it would be possible to (very slowly) deprecate the use of a list of single integers. But I'm trying to go with just making the existing method more useful.
I just followed what is done for reduce, where a default could also have made sense given that `where` can exclude all inputs along a given row. I'm not convinced it would be necessary to have both, though it would not be hard to add. All the best, Marten
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
Yeah, I don't mind this, doesn't stop us from a better idea either. Adding to `.reduce` could be fine, but overall I actually think a new name or using `reduceat` is nicer than overloading it more, even `reduce_slices()`.
<snip>
Sorry, I misread the code: You do use initial the same way as in reductions, I thought it wasn't used when there were multiple elements. I.e. it is used for non-empty slices also. There is still a little annoyance when `initial=` isn't passed, since default/initial can be different (this is the case for object add for example: the default is `0`, but it is not used as initial for non empty reductions). Anyway, its a small details to some degree even if it may be finicky to get right. At the moment it seems passing `dtype=object` somehow changes the result also. - Sebastian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
Was looking at the PR, which still seems worthwhile, although not urgnet right now. But, this makes me think (loudly ;)) that the `get_reduction_initial` should maybe distinguish this more fully... Because there are 3 cases, even if we only use the first two currently: 1. True idenity: default and initial are the same. 2. Default but no initial: Object sum has no initial, but does use `0` as default. 3. Initial is not valid default: This would be useful to simplify min/max reductions: `-inf` or `MIN_INT` are valid initial values but are not valid default values. - Sebastian
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi All, This discussion about updating reduceat went silent, but recently I came back to my PR to allow `indices` to be a 2-dimensional array of start and stop values (or a tuple of separate start and stop arrays). I thought a bit more about it and think it is the easiest way to extend the present definition. So, I have added some tests and documentation and would now like to open it for proper discussion. See https://github.com/numpy/numpy/pull/25476 From the examples there: ``` a = np.arange(12) np.add.reduceat(a, ([1, 3, 5], [2, -1, 0])) # array([ 1, 52, 0]) np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0]), initial=10) # array([ 1, 3, 10]) np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0])) # ValueError: empty slice encountered with reduceat operation for 'minimum', which does not have an identity. Specify 'initial'. ``` Let me know what you all think, Marten p.s. Rereading the thread, I see we discussed initial vs default values in some detail. This is interesting, but somewhat orthogonal to the PR, since it just copies behaviour already present for reduce.
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
I am not sure how I feel about this. If I understand correctly, the issue started as a corner case when the indices were incorrect, and grew to dealing with initial values, and then added a desire for piecewise reducat with multiple segements. Is that correct? Could you give a better summary of the issue the PR is trying to solve? The examples look magic to me, it took me a long time to understand that the `[1, 3, 5]` correspond to start indices and `[2, -1, 0]` correspond to stop indices. Perhaps we should require kwarg use instead of positional to make the code more readable. Matti On Sun, Nov 24, 2024 at 3:13 AM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi Matti, I'm sorry, I should probably have started a new thread with a proper introduction. `reduceat` has always been about having piecewise reductions, but in a way that is rather convoluted. From https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html One sees that the indices are interpreted as follows: ``` For i in range(len(indices)), reduceat computes ufunc.reduce(array[indices[i]:indices[i+1]]), which becomes the i-th generalized "row" parallel to `axis` in the final result (i.e., <snip>). There are three exceptions to this: * when i = len(indices) - 1 (so for the last index), indices[i+1] = array.shape[axis]. * if indices[i] >= indices[i + 1], the i-th generalized “row” is simply array[indices[i]]. * if indices[i] >= len(array) or indices[i] < 0, an error is raised. ``` The exceptions are the main issue I have with the current definition (see also other threads over the years [1]): really, the current setup is only natural for contiguous pieces; for anything else, it requires contortion. For instance, the documentation describes how to get a running sum as follows: ``` np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2] ``` Note the slice at the end to remove the unwanted elements! And note that this *omits* the last set of 4 elements -- to get this, one has to add a solitary index 4 at the end - one cannot get slices that include the last element except as the last one. The PR arose from this unnatural way to describe slices: Why can one not just pass in the start and stop values directly? With no exceptions, but just interpreted as slices should be. I.e., get a running sum as ``` np.add.reduceat(np.arange(8), ((start := np.arange(0, 8//2+1)), start+8//2)) Currently, the updated docstring explains the new mode as follows: ``` There are two modes for how `indices` is interpreted. If it is a tuple of 2 arrays (or an array with two rows), then these are interpreted as start and stop values of slices over which to compute reductions, i.e., for each row i, ``ufunc.reduce(array[indices[0, i]:indices[1, i]])`` is computed, which becomes the i-th element along `axis` in the final result (e.g., in a 2-D array, if ``axis=0``, it becomes the i-th row, but if ``axis=1``, it becomes the i-th column). Like for slices, negative indices are allowed for both start and stop, and the values are clipped to be between 0 and the shape of the array along `axis`. ``` The reason `initial` was added is that with the new layout is that I did not want to have the exception currently present, where if stop < start, one gets the value at start. I felt it was more logical to treat this case as an empty reduction, but then it becomes necessary to able to pass in an initial value for reductions that do not have an identity, like np.minimum (which of course just helps make `reduceat` more similar to `reduce`). Note that I considered requiring `slice(start, stop)`, which might be clearer. I only did not do that since implementation-wise just having a tuple or an array with 2 columns was super easy. I also liked that with this implementation the old way could at least in principle be described in terms of the new one, as having a default stop that just takes the next element of start (with the same exceptions as above). I ended not describing it as such in the docstring, though. Anyway, if in principle it is thought a good idea to make `reduceat` more flexible, the API is up for discussion. It could require `indices=slice(start, stop)` (possibly step too), or one could have allow not passing in `indices` if `start` and `stop` are present. Hope this clarifies things! Marten matti picus via NumPy-Discussion <numpy-discussion@python.org> writes:
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
I forgot to add links to previous discussions. Github issue: https://github.com/numpy/numpy/issues/834 2011 thread: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/DX5... 2016 thread #1: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RI7... 2016 thread #2: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RZZ... 2017 thread: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YKL... (where I suggested adding a `slice` argument to reduce instead; also an option...) 2023 thread (this one): https://mail.python.org/archives/list/numpy-discussion@python.org/thread/VWD... -- Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Fri, Dec 22, 2023 at 12:34 PM Martin Ling <martin-numpy@earth.li> wrote:
The reduceat API is certainly problematic, but I don't think fixing it is really a NumPy 2.0 thing. As discussed in that issue, the right way to fix that is to add a new API with the correct behavior, and then we can think about deprecating (and maybe eventually removing) the current reduceat method. If the new reducebins() method were available, I would say removing reduceat() would be appropriate to consider for NumPy 2, but we don't have the new method with fixed behavior yet, which is the bigger blocker.
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi Martin, I agree it is a long-standing issue, and I was reminded of it by your comment. I have a draft PR at https://github.com/numpy/numpy/pull/25476 that does not change the old behaviour, but allows you to pass in a start-stop array which behaves more sensibly (exact API TBD). Please have a look! Marten Martin Ling <martin-numpy@earth.li> writes:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Fri, 2023-12-22 at 18:01 -0500, Marten van Kerkwijk wrote:
That looks nice, I don't have a clear feeling on the order of items, if we think of it in terms of `(start, stop)` there was also the idea voiced to simply add another name in which case you would allow start and stop to be separate arrays. Of course if go with your `slice(start, stop)` idea that also works, although passing as separate parameters seems nice too. Adding another name (if we can think of one at least) seems pretty good to me, since I suspect we would add docs to suggest not using `reduceat`. One small thing about the PR: I would like to distinct `default` and `initial`. I.e. the default value is used only for empty reductions, while the initial value should be always used (unless you would pass both, which we don't for normal reductions though). I suppose the machinery isn't quite set up to do both side-by-side. - Sebastian
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi Sebastian,
Yes, one could add another method. Or perhaps even add a new argument to `.reduce` instead (say `slices`). But this seemed the simplest route...
If we'd want to, even with the present PR it would be possible to (very slowly) deprecate the use of a list of single integers. But I'm trying to go with just making the existing method more useful.
I just followed what is done for reduce, where a default could also have made sense given that `where` can exclude all inputs along a given row. I'm not convinced it would be necessary to have both, though it would not be hard to add. All the best, Marten
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
Yeah, I don't mind this, doesn't stop us from a better idea either. Adding to `.reduce` could be fine, but overall I actually think a new name or using `reduceat` is nicer than overloading it more, even `reduce_slices()`.
<snip>
Sorry, I misread the code: You do use initial the same way as in reductions, I thought it wasn't used when there were multiple elements. I.e. it is used for non-empty slices also. There is still a little annoyance when `initial=` isn't passed, since default/initial can be different (this is the case for object add for example: the default is `0`, but it is not used as initial for non empty reductions). Anyway, its a small details to some degree even if it may be finicky to get right. At the moment it seems passing `dtype=object` somehow changes the result also. - Sebastian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
Was looking at the PR, which still seems worthwhile, although not urgnet right now. But, this makes me think (loudly ;)) that the `get_reduction_initial` should maybe distinguish this more fully... Because there are 3 cases, even if we only use the first two currently: 1. True idenity: default and initial are the same. 2. Default but no initial: Object sum has no initial, but does use `0` as default. 3. Initial is not valid default: This would be useful to simplify min/max reductions: `-inf` or `MIN_INT` are valid initial values but are not valid default values. - Sebastian
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi All, This discussion about updating reduceat went silent, but recently I came back to my PR to allow `indices` to be a 2-dimensional array of start and stop values (or a tuple of separate start and stop arrays). I thought a bit more about it and think it is the easiest way to extend the present definition. So, I have added some tests and documentation and would now like to open it for proper discussion. See https://github.com/numpy/numpy/pull/25476 From the examples there: ``` a = np.arange(12) np.add.reduceat(a, ([1, 3, 5], [2, -1, 0])) # array([ 1, 52, 0]) np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0]), initial=10) # array([ 1, 3, 10]) np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0])) # ValueError: empty slice encountered with reduceat operation for 'minimum', which does not have an identity. Specify 'initial'. ``` Let me know what you all think, Marten p.s. Rereading the thread, I see we discussed initial vs default values in some detail. This is interesting, but somewhat orthogonal to the PR, since it just copies behaviour already present for reduce.
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
I am not sure how I feel about this. If I understand correctly, the issue started as a corner case when the indices were incorrect, and grew to dealing with initial values, and then added a desire for piecewise reducat with multiple segements. Is that correct? Could you give a better summary of the issue the PR is trying to solve? The examples look magic to me, it took me a long time to understand that the `[1, 3, 5]` correspond to start indices and `[2, -1, 0]` correspond to stop indices. Perhaps we should require kwarg use instead of positional to make the code more readable. Matti On Sun, Nov 24, 2024 at 3:13 AM Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
Hi Matti, I'm sorry, I should probably have started a new thread with a proper introduction. `reduceat` has always been about having piecewise reductions, but in a way that is rather convoluted. From https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html One sees that the indices are interpreted as follows: ``` For i in range(len(indices)), reduceat computes ufunc.reduce(array[indices[i]:indices[i+1]]), which becomes the i-th generalized "row" parallel to `axis` in the final result (i.e., <snip>). There are three exceptions to this: * when i = len(indices) - 1 (so for the last index), indices[i+1] = array.shape[axis]. * if indices[i] >= indices[i + 1], the i-th generalized “row” is simply array[indices[i]]. * if indices[i] >= len(array) or indices[i] < 0, an error is raised. ``` The exceptions are the main issue I have with the current definition (see also other threads over the years [1]): really, the current setup is only natural for contiguous pieces; for anything else, it requires contortion. For instance, the documentation describes how to get a running sum as follows: ``` np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2] ``` Note the slice at the end to remove the unwanted elements! And note that this *omits* the last set of 4 elements -- to get this, one has to add a solitary index 4 at the end - one cannot get slices that include the last element except as the last one. The PR arose from this unnatural way to describe slices: Why can one not just pass in the start and stop values directly? With no exceptions, but just interpreted as slices should be. I.e., get a running sum as ``` np.add.reduceat(np.arange(8), ((start := np.arange(0, 8//2+1)), start+8//2)) Currently, the updated docstring explains the new mode as follows: ``` There are two modes for how `indices` is interpreted. If it is a tuple of 2 arrays (or an array with two rows), then these are interpreted as start and stop values of slices over which to compute reductions, i.e., for each row i, ``ufunc.reduce(array[indices[0, i]:indices[1, i]])`` is computed, which becomes the i-th element along `axis` in the final result (e.g., in a 2-D array, if ``axis=0``, it becomes the i-th row, but if ``axis=1``, it becomes the i-th column). Like for slices, negative indices are allowed for both start and stop, and the values are clipped to be between 0 and the shape of the array along `axis`. ``` The reason `initial` was added is that with the new layout is that I did not want to have the exception currently present, where if stop < start, one gets the value at start. I felt it was more logical to treat this case as an empty reduction, but then it becomes necessary to able to pass in an initial value for reductions that do not have an identity, like np.minimum (which of course just helps make `reduceat` more similar to `reduce`). Note that I considered requiring `slice(start, stop)`, which might be clearer. I only did not do that since implementation-wise just having a tuple or an array with 2 columns was super easy. I also liked that with this implementation the old way could at least in principle be described in terms of the new one, as having a default stop that just takes the next element of start (with the same exceptions as above). I ended not describing it as such in the docstring, though. Anyway, if in principle it is thought a good idea to make `reduceat` more flexible, the API is up for discussion. It could require `indices=slice(start, stop)` (possibly step too), or one could have allow not passing in `indices` if `start` and `stop` are present. Hope this clarifies things! Marten matti picus via NumPy-Discussion <numpy-discussion@python.org> writes:
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
I forgot to add links to previous discussions. Github issue: https://github.com/numpy/numpy/issues/834 2011 thread: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/DX5... 2016 thread #1: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RI7... 2016 thread #2: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RZZ... 2017 thread: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YKL... (where I suggested adding a `slice` argument to reduce instead; also an option...) 2023 thread (this one): https://mail.python.org/archives/list/numpy-discussion@python.org/thread/VWD... -- Marten
participants (5)
-
Marten van Kerkwijk
-
Martin Ling
-
matti picus
-
Sebastian Berg
-
Stephan Hoyer