Hello,
Currently, the built-in Python round (which is different from np.round) when called on a np.float64 returns a np.float64, due to its __round__ method. A congruous statement is true for np.float32. However, since Python 3, the default behavior of round is to return a Python int when it operates on a Python float. This is a mismatch according to the Liskov Substitution Principlehttps://en.wikipedia.org/wiki/Liskov_substitution_principle, as both these types subclass Python’s float. This has been brought up in gh-15297https://github.com/numpy/numpy/issues/15297. Here is the problem summed up in code:
type(round(np.float64(5)))
<class 'numpy.float64'>
type(round(np.float32(5)))
<class 'numpy.float32'>
type(round(float(5)))
<class 'int'>
This problem manifests itself most prominently when trying to index into collections:
np.arange(6)[round(float(5))]
5
np.arange(6)[round(np.float64(5))]
Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
There still remains the question, do we return Python ints or np.int64s?
* Python ints have the advantage of not overflowing. * If we decide to add __round__ to arrays in the future, Python ints may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
* change scalar floats to return integers for __round__ (which integer type was not discussed, I propose np.int64) * not change anything else: not 0d arrays and not other numpy functionality Does anyone have any thoughts on the proposal? Best regards, Hameer Abbasi
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi einstein.edison@gmail.com wrote:
There still remains the question, do we return Python ints or np.int64s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python ints
may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which integer
type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
The only reason that float.__round__() was allowed to change to returning
ints was because ints became unbounded. If we also change to returning an integer type, it should be a Python int.
great another object array
np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])]) array([1, 2, 200000000000000000000,
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896], dtype=object)
I would rather have numpy consistent with numpy than with python
On Wed, Feb 26, 2020 at 4:38 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi einstein.edison@gmail.com wrote:
There still remains the question, do we return Python ints or np.int64s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python ints
may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which
integer type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
The only reason that float.__round__() was allowed to change to returning
ints was because ints became unbounded. If we also change to returning an integer type, it should be a Python int.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Feb 26, 2020 at 5:27 PM josef.pktd@gmail.com wrote:
great another object array
np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])]) array([1, 2, 200000000000000000000,
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896], dtype=object)
I would rather have numpy consistent with numpy than with python
Since round() (and the __round__() interface) is part of Python and not numpy, there is nothing in numpy to be consistent with. We only implement __round__() for the scalar types.
On Wed, Feb 26, 2020 at 6:09 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:27 PM josef.pktd@gmail.com wrote:
great another object array
np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])]) array([1, 2, 200000000000000000000,
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896], dtype=object)
I would rather have numpy consistent with numpy than with python
Since round() (and the __round__() interface) is part of Python and not numpy, there is nothing in numpy to be consistent with. We only implement __round__() for the scalar types.
Maybe I misunderstand
I'm using np.round a lot. So maybe it's a question whether and how it will affect np.round.
Does the following change with the proposal?
np.round(np.array([1, 2.5, 2e20, 2e200])) array([1.e+000, 2.e+000, 2.e+020, 2.e+200])
np.round(np.array([1, 2.5, 2e20, 2e200])).astype(int) array([ 1, 2, -2147483648, -2147483648])
np.round(np.array([2e200])[0]) 2e+200
np.round(2e200) 2e+200
round(2e200) 199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896
Josef "around 100" sounds like "something all_close(100)"
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Feb 26, 2020 at 6:57 PM josef.pktd@gmail.com wrote:
On Wed, Feb 26, 2020 at 6:09 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:27 PM josef.pktd@gmail.com wrote:
great another object array
np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])]) array([1, 2, 200000000000000000000,
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896], dtype=object)
I would rather have numpy consistent with numpy than with python
Since round() (and the __round__() interface) is part of Python and not numpy, there is nothing in numpy to be consistent with. We only implement __round__() for the scalar types.
Maybe I misunderstand
I'm using np.round a lot. So maybe it's a question whether and how it will affect np.round.
Does the following change with the proposal?
np.round(np.array([1, 2.5, 2e20, 2e200])) array([1.e+000, 2.e+000, 2.e+020, 2.e+200])
np.round(np.array([1, 2.5, 2e20, 2e200])).astype(int) array([ 1, 2, -2147483648, -2147483648])
np.round(np.array([2e200])[0]) 2e+200
np.round(2e200) 2e+200
round(2e200)
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896
Josef "around 100" sounds like "something all_close(100)"
I guess I'm slow
It only affects this case, as long as we don't have __round__ in arrays
round(np.float64(2e200)) 2e+200
round(np.array([1, 2.5, 2e20, 2e200])) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-177-bd4a17555729> in <module> ----> 1 round(np.array([1, 2.5, 2e20, 2e200]))
TypeError: type numpy.ndarray doesn't define __round__ method
Josef
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Feb 26, 2020 at 6:59 PM josef.pktd@gmail.com wrote:
On Wed, Feb 26, 2020 at 6:09 PM Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:27 PM josef.pktd@gmail.com wrote:
great another object array
np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])]) array([1, 2, 200000000000000000000,
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896], dtype=object)
I would rather have numpy consistent with numpy than with python
Since round() (and the __round__() interface) is part of Python and not numpy, there is nothing in numpy to be consistent with. We only implement __round__() for the scalar types.
Maybe I misunderstand
I'm using np.round a lot. So maybe it's a question whether and how it will affect np.round.
Nope, not changing.
Does the following change with the proposal?
np.round(np.array([1, 2.5, 2e20, 2e200])) array([1.e+000, 2.e+000, 2.e+020, 2.e+200])
np.round(np.array([1, 2.5, 2e20, 2e200])).astype(int) array([ 1, 2, -2147483648, -2147483648])
np.round(np.array([2e200])[0]) 2e+200
np.round(2e200) 2e+200
No change.
round(2e200)
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896
Obviously, not under out control, but no, that's not changing.
This is the only result that will change:
round(np.float64(2e200)) 2e+200
Josef "around 100" sounds like "something all_close(100)"
I know. It's meant to be read as "array-round". We prefer the `around()` spelling to avoid shadowing the built-in. Early mistake that we're still living with.
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
On Wed, Feb 26, 2020, 21:38 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi einstein.edison@gmail.com wrote:
There still remains the question, do we return Python ints or np.int64s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python ints
may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which
integer type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
The only reason that float.__round__() was allowed to change to returning
ints was because ints became unbounded. If we also change to returning an integer type, it should be a Python int.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat ilhanpolat@gmail.com wrote:
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
On Wed, Feb 26, 2020, 21:38 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi einstein.edison@gmail.com wrote:
There still remains the question, do we return Python ints or np.int64s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python ints
may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which
integer type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
I think making numerical behavior different between arrays and numpy scalars with the same dtype, will create many happy debugging hours.
(although I don't remember having been careful about the distinction between python scalars and numpy scalars in some time. I had some fun with integers in the scipy.stats discrete distributions, until they became floats)
Josef
The only reason that float.__round__() was allowed to change to returning
ints was because ints became unbounded. If we also change to returning an integer type, it should be a Python int.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Feb 26, 2020 at 5:41 PM josef.pktd@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat ilhanpolat@gmail.com wrote:
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
On Wed, Feb 26, 2020, 21:38 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi einstein.edison@gmail.com wrote:
There still remains the question, do we return Python ints or np.int64 s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python ints
may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which
integer type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
I think making numerical behavior different between arrays and numpy scalars with the same dtype, will create many happy debugging hours.
round(some_ndarray) isn't implemented, so there is no difference to worry about.
If you want the float->float rounding, use np.around(). That function should continue to behave like it currently does for both arrays and scalars.
It's not about what I want but this changes the output of round. In my example I didn't use any arrays but a scalar type which looks like will upcasted.
On Wed, Feb 26, 2020, 23:04 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:41 PM josef.pktd@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat ilhanpolat@gmail.com wrote:
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
On Wed, Feb 26, 2020, 21:38 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi < einstein.edison@gmail.com> wrote:
There still remains the question, do we return Python ints or np.int64 s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python ints
may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which
integer type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
I think making numerical behavior different between arrays and numpy scalars with the same dtype, will create many happy debugging hours.
round(some_ndarray) isn't implemented, so there is no difference to worry about.
If you want the float->float rounding, use np.around(). That function should continue to behave like it currently does for both arrays and scalars.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Your example used np.round(), not the builtin round(). np.round() is not changing. If you want the dtype of the output to be the dtype of the input, you can certainly keep using np.round() (or its canonical spelling, np.around()).
On Thu, Feb 27, 2020, 12:05 AM Ilhan Polat ilhanpolat@gmail.com wrote:
It's not about what I want but this changes the output of round. In my example I didn't use any arrays but a scalar type which looks like will upcasted.
On Wed, Feb 26, 2020, 23:04 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:41 PM josef.pktd@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat ilhanpolat@gmail.com wrote:
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
On Wed, Feb 26, 2020, 21:38 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi < einstein.edison@gmail.com> wrote:
There still remains the question, do we return Python ints or np.int64s?
- Python ints have the advantage of not overflowing.
- If we decide to add __round__ to arrays in the future, Python
ints may become inconsistent with our design, as such a method will return an int64 array.
This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:
- change scalar floats to return integers for __round__ (which
integer type was not discussed, I propose np.int64)
- not change anything else: not 0d arrays and not other numpy
functionality
I think making numerical behavior different between arrays and numpy scalars with the same dtype, will create many happy debugging hours.
round(some_ndarray) isn't implemented, so there is no difference to worry about.
If you want the float->float rounding, use np.around(). That function should continue to behave like it currently does for both arrays and scalars.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Oh sorry. That's trigger finger np-dotting.
What i mean is if someone was using the round method on float32 or other small bit datatypes they would have a silent upcasting.
Maybe not a big problem but can have significant impact.
On Thu, Feb 27, 2020, 05:12 Robert Kern robert.kern@gmail.com wrote:
Your example used np.round(), not the builtin round(). np.round() is not changing. If you want the dtype of the output to be the dtype of the input, you can certainly keep using np.round() (or its canonical spelling, np.around()).
On Thu, Feb 27, 2020, 12:05 AM Ilhan Polat ilhanpolat@gmail.com wrote:
It's not about what I want but this changes the output of round. In my example I didn't use any arrays but a scalar type which looks like will upcasted.
On Wed, Feb 26, 2020, 23:04 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:41 PM josef.pktd@gmail.com wrote:
On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat ilhanpolat@gmail.com wrote:
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
On Wed, Feb 26, 2020, 21:38 Robert Kern robert.kern@gmail.com wrote:
On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi < einstein.edison@gmail.com> wrote:
> > There still remains the question, do we return Python ints or > np.int64s? > > - Python ints have the advantage of not overflowing. > - If we decide to add __round__ to arrays in the future, Python > ints may become inconsistent with our design, as such a method > will return an int64 array. > > > > This was issue was discussed in the weekly triage meeting today, and > the following plan of action was proposed: > > - change scalar floats to return integers for __round__ (which > integer type was not discussed, I propose np.int64) > - not change anything else: not 0d arrays and not other numpy > functionality > >
I think making numerical behavior different between arrays and numpy scalars with the same dtype, will create many happy debugging hours.
round(some_ndarray) isn't implemented, so there is no difference to worry about.
If you want the float->float rounding, use np.around(). That function should continue to behave like it currently does for both arrays and scalars.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hello, Ilhan,
From: NumPy-Discussion numpy-discussion-bounces+einstein.edison=gmail.com@python.org on behalf of Ilhan Polat ilhanpolat@gmail.com Reply to: Discussion of Numerical Python numpy-discussion@python.org Date: Thursday, 27. February 2020 at 08:41 To: Discussion of Numerical Python numpy-discussion@python.org Subject: Re: [Numpy-discussion] Output type of round is inconsistent with python built-in
Oh sorry. That's trigger finger np-dotting.
What i mean is if someone was using the round method on float32 or other small bit datatypes they would have a silent upcasting.
No they won’t. The only affected types would be scalars, and that too only with the built-in Python round. Arrays don’t define the __round__ method, and so won’t be affected. np.ndarray.round won’t be affected either. Only np_scalar_types.__round__ will be affected, which is what the Python round checks for.
For illustration, in code:
type(round(np_float))
<class 'numpy.float64'>
type(round(np_array_0d))
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type numpy.ndarray doesn't define __round__ method
type(round(np_array_nd))
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type numpy.ndarray doesn't define __round__ method
The second and third cases would remain unaffected. Only the first case would return a builtin Python int with what Robert Kern is suggesting and a np.int64 with what I’m suggesting. I do agree with something posted elsewhere on this thread that we should warn on overflow but prefer to be self-consistent and return a np.int64, but it doesn’t matter too much to me. Furthermore, the behavior of np.[a]round and np_arr.round(…) will not change. The only upcasting problem here is if someone does this in a loop, in which case they’re probably using Python objects and don’t care about memory anyway.
Maybe not a big problem but can have significant impact.
Best regards, Hameer Abbasi
On Thu, Feb 27, 2020 at 4:49 AM Hameer Abbasi einstein.edison@gmail.com wrote:
Hello, Ilhan,
*From: *NumPy-Discussion <numpy-discussion-bounces+einstein.edison= gmail.com@python.org> on behalf of Ilhan Polat ilhanpolat@gmail.com *Reply to: *Discussion of Numerical Python numpy-discussion@python.org *Date: *Thursday, 27. February 2020 at 08:41 *To: *Discussion of Numerical Python numpy-discussion@python.org *Subject: *Re: [Numpy-discussion] Output type of round is inconsistent with python built-in
Oh sorry. That's trigger finger np-dotting.
What i mean is if someone was using the round method on float32 or other small bit datatypes they would have a silent upcasting.
No they won’t. The only affected types would be scalars, and that too only with the built-in Python round.
Just to be clear, his example _did_ use numpy scalars.
On Thu, Feb 27, 2020 at 2:43 AM Ilhan Polat ilhanpolat@gmail.com wrote:
Oh sorry. That's trigger finger np-dotting.
What i mean is if someone was using the round method on float32 or other small bit datatypes they would have a silent upcasting.
Maybe not a big problem but can have significant impact.
np.round()/np.around() will still exist and behave as you would want it to in such cases (float32->float32, float64->float64).
On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat ilhanpolat@gmail.com wrote:
Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
No. np.round() is an alias (which would be good to deprecate) for np.around(). No one has proposed changing np.around().
That would be really awkward for many reasons pandas frame size being bloated just by rounding for an example. Or numpy array size growing for no apparent reason
I am not really sure if I understand why LSP should hold in this case to be honest. Rounding is an operation specific for the number instance and not for the generic class.
The type of the return value is part of the type's interface, not the specific instance.