On 1/28/07, Fernando Perez
On 1/28/07, Charles R Harris
wrote: The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
Interesting, I wonder if ATLAS is resetting the FPU flags and changing the rounding mode? It is just the LSB of the mantissa that looks to be changing. Before reporting the problem it might be good to pin it down a bit more if possible.
Well, the fact that I don't see the problem on a PentiumIII (with atlas-sse) but I see it on my desktop (atlas-sse2) should tell us something. The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead. Intel FPUs use 80 bits internally for intermediate operations (even though they only return a normal 64-bit double result), so it's fairly common to see this kind of thing.
Removing ATLAS SSE2 does fix the problem. But why does test1 pass when ATLAS SSE2 is present?