Numpy has three histogram functions -
histogram
,histogram2d
, andhistogramdd
.
histogram
is by far the most widely used, and in the absence of weights and normalization, returns annp.intp
count for each bin.
histogramdd
(for whichhistogram2d
is a wrapper) returnsnp.float64
in all circumstances.As a contrived comparison
>>> x = np.linspace(0, 1) >>> h, e = np.histogram(x*x, bins=4); h array([25, 10, 8, 7], dtype=int64) >>> h, e = np.histogramdd((x*x,), bins=4); h array([25., 10., 8., 7.])
https://github.com/numpy/
numpy/issues/7845 tracks this inconsistency.The fix is now trivial: the question is, will changing the return type break people’s code?
Either we should:
- Just change it, and hope no one is broken by it
- Add a
dtype
argument:
- If
dtype=None
, behave likenp.histogram
- If
dtype
is not specified, emit a future warning recommending to usedtype=None
ordtype=float
- In future, change the default to
None
- Create a new better-named function
histogram_nd
, which can also be created without the mistake that is https://github.com/numpy/numpy/issues/10864 .Thoughts?