Casting and promotion rules (e.g. int + uint64 => float)

Hi,
I have noticed that numpy introduces some unexpected type casts, that are in some cases problematic.
A very weird cast is
int + uint64 -> float
for instance, consider the following snippet:
import numpy as np a=np.uint64(1) a+1 -> 2.0
this cast is quite different from what other programming languages (e.g., C) would do in this case, so it already comes unexpected.
Furthermore, an int64 (or an uint64) is too large to fit into a float, hence this automatic conversion also results in data loss! For instance consider:
a=np.uint64(18446744073709551614) a+np.uint64(1) -> 18446744073709551615 # CORRECT! a+1 -> 1.8446744073709552e+19 # Actually 1.84467440737095516160e+19 - LOSS OF DATA
in fact
np.uint64(a+1) -> 0
Weird, isn't it?
Another issue is that variables unexpectedly change type with accumulation operators
a=np.uint64(1) a+=1
now a is float
I believe that some casting/promotion rules should be revised, since they now lead to difficult to catch, intermittent errors. In case this cannot be done immediately, I suggest at least documenting these promotions, providing examples on how to code many conventional tasks. E.g., incrementing an integer of unknown size
b=a+type(a)(1)
I have also reported this in https://github.com/numpy/numpy/issues/3118
Thanks!

On Fri, Mar 8, 2013 at 8:23 AM, Sergio Callegari sergio.callegari@gmail.com wrote:
I have noticed that numpy introduces some unexpected type casts, that are in some cases problematic.
There has been a lot of discussion about casting on this list in the last couple months -- I suggest you peruse that discussion and see what conclusions it has lead to.
A very weird cast is
int + uint64 -> float
I think the idea here is that an int can hold negative numbers, so you can't put it in a uint64 -- but you can't put a uint64 into a signed int64. A float64 can hold the range of numbers of both a int and uint64, so it is used, even though it can't hold the full precision of a uint64 (far from it!)
Another issue is that variables unexpectedly change type with accumulation operators
a=np.uint64(1) a+=1
now a is float
yeah -- that should NEVER happen -- += is supposed to be an iin=place operator, it should never change the array! However, what you've crated here is not an array, but a numpy scalar, and the rules are different there (but should they be?). I suspect that part of the issue is that array scalars behave a bit more like the built-in numpy number types, and thus += is not an in-place operator, but rather, translates to:
a = a + 1
and as you've seen, that casts to a float64. A little test:
In [34]: d = np.int64(2)
In [35]: e = d # e and d are the same object
In [36]: d += 1
In [37]: e is d Out[37]: False
# they are not longer the same object -- the += created a new object
In [38]: type(d) Out[38]: numpy.int64
# even though it's still the same type (no casting needed)
If you do use an array, you don't get casting with +=:
In [39]: a = np.array((1,), dtype=np.uint64)
In [40]: a Out[40]: array([1], dtype=uint64)
In [41]: a + 1.0 Out[41]: array([ 2.])
# got a cast with the additon and creation of a new array
In [42]: a += 1.0
In [43]: a Out[43]: array([2], dtype=uint64) # but no cast with the in-place operator.
Personally, I think the "in-place" operators should be just that -- and only work for mutable objects, but I guess the ability to easily increment in integer was just too tempting!
-Chris

Thanks for the explanation.
Chris Barker - NOAA Federal <chris.barker <at> noaa.gov> writes:
There has been a lot of discussion about casting on this list in the last couple months -- I suggest you peruse that discussion and see what conclusions it has lead to.
I'll look at it. My message to the ml followed an invitation to do so after I posted a bug about weird castings.
int + uint64 -> float
I think the idea here is that an int can hold negative numbers, so you can't put it in a uint64 -- but you can't put a uint64 into a signed int64. A float64 can hold the range of numbers of both a int and uint64, so it is used, even though it can't hold the full precision of a uint64 (far from it!)
I understand the good intention. Yet, this does not follow the principle of least surprise. This is not what most other languages (possibly following C) would do and, most important, dealing with integers, one expects overflows and wraparounds, not certainly a loss of precision.
Another issue is that the promotion rule breaks indexing
a = np.uint64(1) b=[0,1,2,3,4,5] b[a] -> 1 # OK b[a+1] -> Error
I really would like to suggest changing this behavior.
Thanks
Sergio
participants (2)
-
Chris Barker - NOAA Federal
-
Sergio Callegari