[Numpy-discussion] Changing the return type of np.histogramdd

Eric Wieser wieser.eric+numpy at gmail.com
Thu Apr 26 01:07:56 EDT 2018


what does that gain over having the user do something like result.astype()

It means that the user can use integer weights without worrying about
losing precision due to an intermediate float representation.

It also means they can use higher precision values (np.longdouble) or
complex weights.

you’re emitting warnings for everyone

When there’s a risk of precision loss, that seems like the responsible
thing to do. Users passing float weights would see no warning, I suppose.

is this really worth a new function

There ought to be a function for computing histograms with integer weights
that doesn’t lose precision. Either we change the existing function to do
that, or we make a new function.

A possible compromise: like 1, but only change the dtype of the result if a
weights argument is passed.

#10864 <https://github.com/numpy/numpy/issues/10864> seems like a worrying
design flaw too, but I suppose that can be dealt with separately.

Eric
​

On Wed, 25 Apr 2018 at 21:57 Ralf Gommers <ralf.gommers at gmail.com> wrote:

> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Numpy has three histogram functions - histogram, histogram2d, and
>> histogramdd.
>>
>> histogram is by far the most widely used, and in the absence of weights
>> and normalization, returns an np.intp count for each bin.
>>
>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in
>> all circumstances.
>>
>> As a contrived comparison
>>
>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h
>> array([25, 10,  8,  7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h
>> array([25., 10.,  8.,  7.])
>>
>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency.
>>
>> The fix is now trivial: the question is, will changing the return type
>> break people’s code?
>>
>> Either we should:
>>
>>    1. Just change it, and hope no one is broken by it
>>    2. Add a dtype argument:
>>       - If dtype=None, behave like np.histogram
>>       - If dtype is not specified, emit a future warning recommending to
>>       use dtype=None or dtype=float
>>       - In future, change the default to None
>>    3. Create a new better-named function histogram_nd, which can also be
>>    created without the mistake that is
>>    https://github.com/numpy/numpy/issues/10864.
>>
>> Thoughts?
>>
>
> (1)  sems like a no-go, taking such risks isn't justified by a minor
> inconsistency.
>
> (2) is still fairly intrusive, you're emitting warnings for everyone and
> still force people to change their code (and if they don't they may run
> into a backwards compat break).
>
> (3) is the best of these options, however is this really worth a new
> function? My vote would be "do nothing".
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180426/36e0ec77/attachment-0001.html>


More information about the NumPy-Discussion mailing list