Hello,

I would like to propose adding the `out` array as an optional parameter to `bincount`. This makes `bincount` very useful when iteratively tallying data with large indices.

Consider this example tallying batches of values from some fictional source of data:

tally = np.zeros(10000**2) for indices, weights in read_sensor_data():

... tally += np.bincount(indices, weights, 10000**2) # slow: repeatedly adding large arrays

This could be trivially sped up:

tally = np.zeros(10000**2) for indices, weights in read_sensor_data():

... np.bincount(indices, weights, out=tally) # fast: plain sum-loop in C

As far as I can see, there is no equivalent numpy functionality. In fact, as far as I'm aware, there isn't any fast alternative outside of C/Cython/numba/... It also fits the purpose of `bincount` nicely, and does not change existing functionality. One might argue about the exact semantics if both `minlength` and `out` are given, but I think that a sensible answer exists in requiring `len(out) >= max(list.max(), minlength)`.