[Numpy-discussion] casting bug

Tue May 26 19:50:09 EDT 2009

2009/5/26 Charles سمير Doutriaux <doutriaux1 at llnl.gov>:
> Hi there,
>
> One of our users just found a bug in numpy that has to do with casting.
>
> Consider the attached example.
>
> The difference at the end should be  0 (zero) everywhere.
>
> But it's not by default.
>
> Casting the data to 'float64' at reading and assiging to the arrays works
> Defining the arrays "at creation time" as "float32" works
>
> But simply reading in the data (flaot32) and putting them into the default
> arrays (float64) leads to differences!

That is probably not a good characterization of the code. You are
doing a summation of squares of floats two different ways. The data is
stored in the file with dtype=float32. You are reading in the data a
"row" at a time (with the name "tmp" for future reference). You are
keeping an accumulator ("ex2" with dtype=float64) and storing each row
into an array shaped like the whole data in the file ("a" with
dtype=float64).

You calculate the sum of squares one way by "ex2 = ex2 + power(tmp,
2.0)". In this case, the square is computed, then stuffed back into a
float32 intermediate result, *then* added to "ex2".

You calculate the sum of squares the other way by "numpy.sum(power(a,
2.0))". In this case, the square is computed in full float64
precision, then added up.

The difference between the two results is just that of using a low
precision intermediate for the square. You have fairly large inputs so
you are losing a good number of digits by using a float32 intermediate
before the summation.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco