For precision loss of the order of float64 eps, I disagree.
I was thinking more about precision loss on the order of 1, for large
64-bit integers that can’t fit in a float64
Note also that #10864 https://github.com/numpy/numpy/issues/10864 incurs
deliberate precision loss of the order 10**-6 x smallest bin, which is also
much larger than eps.
It’s also possible to refer users to scipy.stats.binned_statistic
That sounds like a good idea to do irrespective of whether histogramdd has
problems - I had no idea those existed. Is there a precedent for referring
to more feature-rich scipy functions from the basic numpy ones?
On Wed, 25 Apr 2018 at 22:51 Ralf Gommers
On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser
wrote:
what does that gain over having the user do something like result.astype()
It means that the user can use integer weights without worrying about losing precision due to an intermediate float representation.
It also means they can use higher precision values (np.longdouble) or complex weights.
None of that seems particularly important to be honest.
you’re emitting warnings for everyone
When there’s a risk of precision loss, that seems like the responsible thing to do.
For precision loss of the order of float64 eps, I disagree. There will be many such places in numpy and in other core libraries.
Users passing float weights would see no warning, I suppose.
is this really worth a new function
There ought to be a function for computing histograms with integer weights that doesn’t lose precision. Either we change the existing function to do that, or we make a new function.
It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), which provides a superset of the histogram functionality and is internally consistent because the implementations of 1d/2d call the dd one.
Ralf
A possible compromise: like 1, but only change the dtype of the result if a weights argument is passed.
#10864 https://github.com/numpy/numpy/issues/10864 seems like a worrying design flaw too, but I suppose that can be dealt with separately.
Eric
On Wed, 25 Apr 2018 at 21:57 Ralf Gommers
wrote: On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser < wieser.eric+numpy@gmail.com> wrote:
Numpy has three histogram functions - histogram, histogram2d, and histogramdd.
histogram is by far the most widely used, and in the absence of weights and normalization, returns an np.intp count for each bin.
histogramdd (for which histogram2d is a wrapper) returns np.float64 in all circumstances.
As a contrived comparison
> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h array([25., 10., 8., 7.])
https://github.com/numpy/numpy/issues/7845 tracks this inconsistency.
The fix is now trivial: the question is, will changing the return type break people’s code?
Either we should:
1. Just change it, and hope no one is broken by it 2. Add a dtype argument: - If dtype=None, behave like np.histogram - If dtype is not specified, emit a future warning recommending to use dtype=None or dtype=float - In future, change the default to None 3. Create a new better-named function histogram_nd, which can also be created without the mistake that is https://github.com/numpy/numpy/issues/10864.
Thoughts?
(1) sems like a no-go, taking such risks isn't justified by a minor inconsistency.
(2) is still fairly intrusive, you're emitting warnings for everyone and still force people to change their code (and if they don't they may run into a backwards compat break).
(3) is the best of these options, however is this really worth a new function? My vote would be "do nothing".
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion