On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:

Numpy has three histogram functions - histogram, histogram2d, and histogramdd.

histogram is by far the most widely used, and in the absence of weights and normalization, returns an np.intp count for each bin.

histogramdd (for which histogram2d is a wrapper) returns np.float64 in all circumstances.

As a contrived comparison

>>> x = np.linspace(0, 1)
>>> h, e = np.histogram(x*x, bins=4); h
array([25, 10,  8,  7], dtype=int64)
>>> h, e = np.histogramdd((x*x,), bins=4); h
array([25., 10.,  8.,  7.])

https://github.com/numpy/numpy/issues/7845 tracks this inconsistency.

The fix is now trivial: the question is, will changing the return type break people’s code?

Either we should:

  1. Just change it, and hope no one is broken by it
  2. Add a dtype argument:
    • If dtype=None, behave like np.histogram
    • If dtype is not specified, emit a future warning recommending to use dtype=None or dtype=float
    • In future, change the default to None
  3. Create a new better-named function histogram_nd, which can also be created without the mistake that is https://github.com/numpy/numpy/issues/10864.

Thoughts?


(1)  sems like a no-go, taking such risks isn't justified by a minor inconsistency.

(2) is still fairly intrusive, you're emitting warnings for everyone and still force people to change their code (and if they don't they may run into a backwards compat break).

(3) is the best of these options, however is this really worth a new function? My vote would be "do nothing".

Ralf