Fixing definition of reduceat for Numpy 2.0?
Hi folks, I don't follow numpy development in much detail these days but I see that there is a 2.0 release planned soon. Would this be an opportunity to change the behaviour of 'reduceat'? This issue has been open in some form since 2006! https://github.com/numpy/numpy/issues/834 The current behaviour was originally inherited from Numeric, and makes reduceat often unusable in practice, even where it should be the perfect, concise, efficient solution. But it has been impossible to change it without breaking compatibіlity with existing code. As a result, horrible hacks are needed instead, e.g. my answer here: https://stackoverflow.com/questions/57694003 Is this something that could finally be fixed in 2.0? Martin
On Fri, Dec 22, 2023 at 12:34 PM Martin Ling <martinnumpy@earth.li> wrote:
Hi folks,
I don't follow numpy development in much detail these days but I see that there is a 2.0 release planned soon.
Would this be an opportunity to change the behaviour of 'reduceat'?
This issue has been open in some form since 2006! https://github.com/numpy/numpy/issues/834
The current behaviour was originally inherited from Numeric, and makes reduceat often unusable in practice, even where it should be the perfect, concise, efficient solution. But it has been impossible to change it without breaking compatibіlity with existing code.
As a result, horrible hacks are needed instead, e.g. my answer here: https://stackoverflow.com/questions/57694003
Is this something that could finally be fixed in 2.0?
The reduceat API is certainly problematic, but I don't think fixing it is really a NumPy 2.0 thing. As discussed in that issue, the right way to fix that is to add a new API with the correct behavior, and then we can think about deprecating (and maybe eventually removing) the current reduceat method. If the new reducebins() method were available, I would say removing reduceat() would be appropriate to consider for NumPy 2, but we don't have the new method with fixed behavior yet, which is the bigger blocker.
Martin _______________________________________________ NumPyDiscussion mailing list  numpydiscussion@python.org To unsubscribe send an email to numpydiscussionleave@python.org https://mail.python.org/mailman3/lists/numpydiscussion.python.org/ Member address: shoyer@gmail.com
Hi Martin, I agree it is a longstanding issue, and I was reminded of it by your comment. I have a draft PR at https://github.com/numpy/numpy/pull/25476 that does not change the old behaviour, but allows you to pass in a startstop array which behaves more sensibly (exact API TBD). Please have a look! Marten Martin Ling <martinnumpy@earth.li> writes:
Hi folks,
I don't follow numpy development in much detail these days but I see that there is a 2.0 release planned soon.
Would this be an opportunity to change the behaviour of 'reduceat'?
This issue has been open in some form since 2006! https://github.com/numpy/numpy/issues/834
The current behaviour was originally inherited from Numeric, and makes reduceat often unusable in practice, even where it should be the perfect, concise, efficient solution. But it has been impossible to change it without breaking compatibіlity with existing code.
As a result, horrible hacks are needed instead, e.g. my answer here: https://stackoverflow.com/questions/57694003
Is this something that could finally be fixed in 2.0?
Martin _______________________________________________ NumPyDiscussion mailing list  numpydiscussion@python.org To unsubscribe send an email to numpydiscussionleave@python.org https://mail.python.org/mailman3/lists/numpydiscussion.python.org/ Member address: mhvk@astro.utoronto.ca
On Fri, 20231222 at 18:01 0500, Marten van Kerkwijk wrote:
Hi Martin,
I agree it is a longstanding issue, and I was reminded of it by your comment. I have a draft PR at https://github.com/numpy/numpy/pull/25476 that does not change the old behaviour, but allows you to pass in a startstop array which behaves more sensibly (exact API TBD).
Please have a look!
That looks nice, I don't have a clear feeling on the order of items, if we think of it in terms of `(start, stop)` there was also the idea voiced to simply add another name in which case you would allow start and stop to be separate arrays. Of course if go with your `slice(start, stop)` idea that also works, although passing as separate parameters seems nice too. Adding another name (if we can think of one at least) seems pretty good to me, since I suspect we would add docs to suggest not using `reduceat`. One small thing about the PR: I would like to distinct `default` and `initial`. I.e. the default value is used only for empty reductions, while the initial value should be always used (unless you would pass both, which we don't for normal reductions though). I suppose the machinery isn't quite set up to do both sidebyside.  Sebastian
Marten
Martin Ling <martinnumpy@earth.li> writes:
Hi folks,
I don't follow numpy development in much detail these days but I see that there is a 2.0 release planned soon.
Would this be an opportunity to change the behaviour of 'reduceat'?
This issue has been open in some form since 2006! https://github.com/numpy/numpy/issues/834
The current behaviour was originally inherited from Numeric, and makes reduceat often unusable in practice, even where it should be the perfect, concise, efficient solution. But it has been impossible to change it without breaking compatibіlity with existing code.
As a result, horrible hacks are needed instead, e.g. my answer here: https://stackoverflow.com/questions/57694003
Is this something that could finally be fixed in 2.0?
Martin _______________________________________________ NumPyDiscussion mailing list  numpydiscussion@python.org To unsubscribe send an email to numpydiscussionleave@python.org https://mail.python.org/mailman3/lists/numpydiscussion.python.org/ Member address: mhvk@astro.utoronto.ca
NumPyDiscussion mailing list  numpydiscussion@python.org To unsubscribe send an email to numpydiscussionleave@python.org https://mail.python.org/mailman3/lists/numpydiscussion.python.org/ Member address: sebastian@sipsolutions.net
Hi Sebastian,
That looks nice, I don't have a clear feeling on the order of items, if we think of it in terms of `(start, stop)` there was also the idea voiced to simply add another name in which case you would allow start and stop to be separate arrays.
Yes, one could add another method. Or perhaps even add a new argument to `.reduce` instead (say `slices`). But this seemed the simplest route...
Of course if go with your `slice(start, stop)` idea that also works, although passing as separate parameters seems nice too.
Adding another name (if we can think of one at least) seems pretty good to me, since I suspect we would add docs to suggest not using `reduceat`.
If we'd want to, even with the present PR it would be possible to (very slowly) deprecate the use of a list of single integers. But I'm trying to go with just making the existing method more useful.
One small thing about the PR: I would like to distinct `default` and `initial`. I.e. the default value is used only for empty reductions, while the initial value should be always used (unless you would pass both, which we don't for normal reductions though). I suppose the machinery isn't quite set up to do both sidebyside.
I just followed what is done for reduce, where a default could also have made sense given that `where` can exclude all inputs along a given row. I'm not convinced it would be necessary to have both, though it would not be hard to add. All the best, Marten
On Sat, 20231223 at 09:56 0500, Marten van Kerkwijk wrote:
Hi Sebastian,
That looks nice, I don't have a clear feeling on the order of items, if we think of it in terms of `(start, stop)` there was also the idea voiced to simply add another name in which case you would allow start and stop to be separate arrays.
Yes, one could add another method. Or perhaps even add a new argument to `.reduce` instead (say `slices`). But this seemed the simplest route...
Yeah, I don't mind this, doesn't stop us from a better idea either. Adding to `.reduce` could be fine, but overall I actually think a new name or using `reduceat` is nicer than overloading it more, even `reduce_slices()`.
<snip>
I suppose the machinery isn't quite set up to do both sidebyside.
I just followed what is done for reduce, where a default could also have made sense given that `where` can exclude all inputs along a given row. I'm not convinced it would be necessary to have both, though it would not be hard to add.
Sorry, I misread the code: You do use initial the same way as in reductions, I thought it wasn't used when there were multiple elements. I.e. it is used for nonempty slices also. There is still a little annoyance when `initial=` isn't passed, since default/initial can be different (this is the case for object add for example: the default is `0`, but it is not used as initial for non empty reductions). Anyway, its a small details to some degree even if it may be finicky to get right. At the moment it seems passing `dtype=object` somehow changes the result also.  Sebastian
All the best,
Marten _______________________________________________ NumPyDiscussion mailing list  numpydiscussion@python.org To unsubscribe send an email to numpydiscussionleave@python.org https://mail.python.org/mailman3/lists/numpydiscussion.python.org/ Member address: sebastian@sipsolutions.net
On Sat, 20231223 at 09:56 0500, Marten van Kerkwijk wrote:
Hi Sebastian,
That looks nice, I don't have a clear feeling on the order of items, if we think of it in terms of `(start, stop)` there was also the idea voiced to simply add another name in which case you would allow start and stop to be separate arrays.
Yes, one could add another method. Or perhaps even add a new argument to `.reduce` instead (say `slices`). But this seemed the simplest route...
Of course if go with your `slice(start, stop)` idea that also works, although passing as separate parameters seems nice too.
Adding another name (if we can think of one at least) seems pretty good to me, since I suspect we would add docs to suggest not using `reduceat`.
If we'd want to, even with the present PR it would be possible to (very slowly) deprecate the use of a list of single integers. But I'm trying to go with just making the existing method more useful.
One small thing about the PR: I would like to distinct `default` and `initial`. I.e. the default value is used only for empty reductions, while the initial value should be always used (unless you would pass both, which we don't for normal reductions though). I suppose the machinery isn't quite set up to do both sidebyside.
I just followed what is done for reduce, where a default could also have made sense given that `where` can exclude all inputs along a given row. I'm not convinced it would be necessary to have both, though it would not be hard to add.
Was looking at the PR, which still seems worthwhile, although not urgnet right now. But, this makes me think (loudly ;)) that the `get_reduction_initial` should maybe distinguish this more fully... Because there are 3 cases, even if we only use the first two currently: 1. True idenity: default and initial are the same. 2. Default but no initial: Object sum has no initial, but does use `0` as default. 3. Initial is not valid default: This would be useful to simplify min/max reductions: `inf` or `MIN_INT` are valid initial values but are not valid default values.  Sebastian
All the best,
Marten _______________________________________________ NumPyDiscussion mailing list  numpydiscussion@python.org To unsubscribe send an email to numpydiscussionleave@python.org https://mail.python.org/mailman3/lists/numpydiscussion.python.org/ Member address: sebastian@sipsolutions.net
participants (4)

Marten van Kerkwijk

Martin Ling

Sebastian Berg

Stephan Hoyer