[Numpy-discussion] float32 to float64 casting

Fri Nov 16 07:51:47 EST 2012

On Fri, Nov 16, 2012 at 6:37 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> On Thu, Nov 15, 2012 at 8:24 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
>>
>> Hello,
>>
>> Could someone briefly explain why are these two operations are casting my
>> float32 arrays to float64?
>>
>> I1 (np.arange(5, dtype='float32')).dtype
>> O1 dtype('float32')
>>
>> I2 (100000*np.arange(5, dtype='float32')).dtype
>> O2 dtype('float64')
>
>
> This one is depends on the size of the multiplier and is first present in
> 1.6.0. I suspect it is a side effect of making the type conversion code
> sensitive to magnitude.

Right, this is the problem:

In [22]: np.can_cast(10000, np.float32, "safe")
Out[22]: True

In [23]: np.can_cast(100000, np.float32, "safe")
Out[23]: False

But... that said... this makes NO SENSE. 100000 is exactly
representable as a float32! can_cast is just wrong, yes?

https://en.wikipedia.org/wiki/Single-precision_floating-point_format
bin(100000) == 0b11000011010100000
Sign: 0
Exponent for an integer: 23 to make the fractional part into an
integer, -7 to shift the leading 1 bit so it's in position 23 instead
of position 16, +127 to correct for the bias
  bin(23 - 7 + 127) == 0b10001111
Fraction = 100000, shifted so that the top bit lands in position 23
  bin(100000 << 7) == 0b110000110101000000000000
Throw away the top bit and concatenate:

  In [74]: np.uint32(0b01000111110000110101000000000000).view(np.float32)
Out[74]: 100000.0

Looks good to me. So... numpy just doesn't know how integer<->float
conversion works...?

>> I3 (np.arange(5, dtype='float32')[0]).dtype
>> O3 dtype('float32')
>>
>> I4 (1*np.arange(5, dtype='float32')[0]).dtype
>> O4 dtype('float64')
>
>
> This one probably depends on the fact that the element is a scalar, but
> doesn't look right. Scalars are promoted differently. Also holds in numpy
> 1.5.0 so is of old provenance.

Yeah, I missed at first that this is scalar * scalar, probably clearer to write:

In [85]: (1 * np.float32(1)).dtype
Out[85]: dtype('float64')

For this numpy just uses ordinary find-a-common-type rules and ignores
the values, and neither int32 nor float32 is a superset of the other,
so it goes for float64.

It's a bit disconcerting that in this case numpy's find-a-common-type
rules don't match C's, though... (C's rule is: if you have an
expression involving floating point, find the widest floating point
type involved, and convert everything to that type. If your expression
only involves integers, then... well, then things get kind of bizarre.
First you upcast anything smaller than an int to an int. Then, find
the widest integer types involved. If one of them is signed and can
represent everything that the other can, you use that. Otherwise you
cast it to unsigned (!!!) and use that type. So uint32 + int32 ->
uint32. This is 6.3.1.8 in C99. I'm not saying we should necessarily
follow the weirdo integer rules, but for floats it's a bit
surprising.)

-n