On Thu, 2022-10-20 at 23:26 +0000, ntessore@pm.me wrote:
Hello,
Hi, I don't have a strong opinion yet it does seem potentially useful, but I think there would be some details to hash out in the proposal. Some thoughts: * `np.add.at` should be able to do what you want (but of course is very slow right now, and maybe hard to get as fast as bincount even if improved). * `out=` by itself usually does not mean to use the values in `out`. So I think you would need either a different name or another flag to indicate use of `out` (rather than overwriting it). * bincount resizes its output dynamically, however, if you provide an output then resizing is not really feasible. Probably you can find a design that solves this. If you always add only a moderate amount of points `np.add.at(tally, indices, weights)` may just be a good solution? (It is "very" slow, but if the problem is having the giant `tally` array, then a factor of 10 "too slow" probably doesn't matter) - Sebastian
I would like to propose adding the `out` array as an optional parameter to `bincount`. This makes `bincount` very useful when iteratively tallying data with large indices.
Consider this example tallying batches of values from some fictional source of data:
tally = np.zeros(10000**2) for indices, weights in read_sensor_data(): ... tally += np.bincount(indices, weights, 10000**2) # slow: repeatedly adding large arrays
This could be trivially sped up:
tally = np.zeros(10000**2) for indices, weights in read_sensor_data(): ... np.bincount(indices, weights, out=tally) # fast: plain sum- loop in C
As far as I can see, there is no equivalent numpy functionality. In fact, as far as I'm aware, there isn't any fast alternative outside of C/Cython/numba/... It also fits the purpose of `bincount` nicely, and does not change existing functionality. One might argue about the exact semantics if both `minlength` and `out` are given, but I think that a sensible answer exists in requiring `len(out) >= max(list.max(), minlength)`. _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net