I get slightly different results when I repeat a calculation. In a long simulation the differences snowball and swamp the effects I am trying to measure. In the attach script there are three tests. In test1, I construct matrices x and y and then repeatedly calculate z = calc(x,y). The result z is the same every time. So this test passes. In test2, I construct matrices x and y each time before calculating z = calc(x,y). Sometimes z is slightly different---of the order of 1e-21 to 1e-18. But the x's test to be equal and so do the y's. This test fails. (It doesn't fail on my friend's windows box; I'm running linux.) test3 is the same as test2 but I calculate z like this: z = calc(100*x ,y) / (100 * 100). This test passes. What is going on? Here is some sample output:
import repeat repeat.all() 4 z different 8.47032947254e-22 5 z different 8.47032947254e-22 7 z different 8.47032947254e-22 9 z different 8.47032947254e-22 10 z different 8.47032947254e-22 16 z different 8.47032947254e-22 24 z different 8.47032947254e-22 25 z different 8.47032947254e-22 26 z different 8.47032947254e-22 27 z different 8.47032947254e-22 30 z different 8.47032947254e-22 32 z different 8.47032947254e-22 34 z different 8.47032947254e-22 35 z different 8.47032947254e-22 36 z different 8.47032947254e-22 39 z different 8.47032947254e-22 40 z different 8.47032947254e-22 41 z different 8.47032947254e-22 45 z different 8.47032947254e-22 46 z different 8.47032947254e-22 50 z different 8.47032947254e-22 52 z different 8.47032947254e-22 53 z different 8.47032947254e-22 55 z different 8.47032947254e-22 56 z different 8.47032947254e-22 58 z different 8.47032947254e-22 66 z different 8.47032947254e-22 67 z different 8.47032947254e-22 71 z different 8.47032947254e-22 73 z different 8.47032947254e-22 74 z different 8.47032947254e-22 83 z different 8.47032947254e-22 86 z different 8.47032947254e-22 87 z different 8.47032947254e-22 88 z different 8.47032947254e-22 89 z different 8.47032947254e-22 90 z different 8.47032947254e-22 92 z different 8.47032947254e-22
test1: 0 differences test2: 38 differences test3: 0 differences Repeated runs tend to give me the same number of differences in test2 for several runs. Then I get a new number of differences which last for severals runs...
I can't reproduce this under Linux, r3510. Output is test1: 0 differences test2: 0 differences test3: 0 differences Does anyone else see this happening? Regards Stéfan On Sat, Jan 27, 2007 at 12:25:18PM -0800, Keith Goodman wrote:
I get slightly different results when I repeat a calculation. In a long simulation the differences snowball and swamp the effects I am trying to measure.
In the attach script there are three tests.
In test1, I construct matrices x and y and then repeatedly calculate z = calc(x,y). The result z is the same every time. So this test passes.
In test2, I construct matrices x and y each time before calculating z = calc(x,y). Sometimes z is slightly different---of the order of 1e-21 to 1e-18. But the x's test to be equal and so do the y's. This test fails. (It doesn't fail on my friend's windows box; I'm running linux.)
test3 is the same as test2 but I calculate z like this: z = calc(100*x ,y) / (100 * 100). This test passes.
What is going on?
On 1/27/07, Stefan <stefan@sun.ac.za> wrote:
I can't reproduce this under Linux, r3510. Output is
test1: 0 differences test2: 0 differences test3: 0 differences
Does anyone else see this happening?
Yes, test1: 0 differences test2: 51 differences test3: 0 differences Oddly, the relative error is always the same: 98 z different 2.0494565872e-16 99 z different 2.0494565872e-16 Which is nearly the same as the double precision 2.2204460492503131e-16, the difference being due to the fact that the precision is defined relative to 1, and the error in the computation are in a number relatively larger (more bits set, but not yet 2). So this looks like an error in the LSB of the floating number. Could be rounding, could be something not reset quite right. I'm thinking possibly hardware at this time, maybe compiler. Linux fedora 2.6.19-1.2895.fc6 #1 SMP Wed Jan 10 19:28:18 EST 2007 i686 athlon i386 GNU/Linux processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 12 model name : AMD Athlon(tm) 64 Processor 2800+ stepping : 0 cpu MHz : 1808.786 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow up ts fid vid ttp bogomips : 3618.83 Athlon 64 running 32 bit linux. Chuck
On Sat, Jan 27, 2007 at 03:11:58PM -0700, Charles R Harris wrote:
Does anyone else see this happening?
Yes,
test1: 0 differences test2: 51 differences test3: 0 differences
Oddly, the relative error is always the same:
98 z different 2.0494565872e-16 99 z different 2.0494565872e-16
Which is nearly the same as the double precision 2.2204460492503131e-16, the difference being due to the fact that the precision is defined relative to 1, and the error in the computation are in a number relatively larger (more bits set, but not yet 2).
So this looks like an error in the LSB of the floating number. Could be rounding, could be something not reset quite right. I'm thinking possibly hardware at this time, maybe compiler.
Interesting! I don't see it on Linux alpha 2.6.17-10-386 #2 Fri Oct 13 18:41:40 UTC 2006 i686 GNU/Linux vendor_id : AuthenticAMD model name : AMD Athlon(tm) XP 2400+ flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up ts but I do see it on Linux voyager 2.6.17-10-generic #2 SMP Fri Oct 13 18:45:35 UTC 2006 i686 GNU/Linux processor : 0 vendor_id : GenuineIntel model name : Genuine Intel(R) CPU T2300 @ 1.66GHz processor : 1 vendor_id : GenuineIntel model name : Genuine Intel(R) CPU T2300 @ 1.66GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor vmx est tm2 xtpr Both machines are running Ubuntu Edgy, exact same software versions. Cheers Stéfan
On 1/27/07, Stefan van der Walt <stefan@sun.ac.za> wrote:
On Sat, Jan 27, 2007 at 03:11:58PM -0700, Charles R Harris wrote:
Does anyone else see this happening?
Yes,
test1: 0 differences test2: 51 differences test3: 0 differences
Oddly, the relative error is always the same:
98 z different 2.0494565872e-16 99 z different 2.0494565872e-16
Which is nearly the same as the double precision 2.2204460492503131e-16,
difference being due to the fact that the precision is defined relative to 1, and the error in the computation are in a number relatively larger (more bits set, but not yet 2).
So this looks like an error in the LSB of the floating number. Could be rounding, could be something not reset quite right. I'm thinking
the possibly
hardware at this time, maybe compiler.
Interesting! I don't see it on
Linux alpha 2.6.17-10-386 #2 Fri Oct 13 18:41:40 UTC 2006 i686 GNU/Linux vendor_id : AuthenticAMD model name : AMD Athlon(tm) XP 2400+ flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up ts
but I do see it on
Linux voyager 2.6.17-10-generic #2 SMP Fri Oct 13 18:45:35 UTC 2006 i686 GNU/Linux processor : 0 vendor_id : GenuineIntel model name : Genuine Intel(R) CPU T2300 @ 1.66GHz processor : 1 vendor_id : GenuineIntel model name : Genuine Intel(R) CPU T2300 @ 1.66GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor vmx est tm2 xtpr
Both machines are running Ubuntu Edgy, exact same software versions.
Hmmm, and your problem machine is running smp linux. As is mine; fedora uses smp even on single processor machines these days. I think we could use more data here comparing OSX Linux (single, smp) Window Chuck
On 1/27/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hmmm, and your problem machine is running smp linux. As is mine; fedora uses smp even on single processor machines these days. I think we could use more data here comparing
OSX Linux (single, smp) Window
OK, this is weird. I modified the repeat code a little to ease collecting of results, and all of a sudden the differences went away. If you look at the attached code, here's what happens for me: a) If I have line 77 like this (commented out): #print '-'*75 I get: [...] 94 z different 8.47032947254e-22 95 z different 8.47032947254e-22 96 z different 8.47032947254e-22 98 z different 8.47032947254e-22 99 z different 8.47032947254e-22 Numpy version: 1.0.2.dev3521 test1: 0 differences test2: 75 differences test3: 0 differences b) If I remove the comment char from that line, I get: tlon[~/Desktop]> python repeat.py --------------------------------------------------------------------------- Numpy version: 1.0.2.dev3521 test1: 0 differences test2: 0 differences test3: 0 differences That's it. One comment char removed, and something that's done /after/ the tests are actually executed. That kind of 'I add a printf() call and the bug disappears' is unpleasantly reminiscent of lurking pointer errors in C code... Cheers, f
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/27/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hmmm, and your problem machine is running smp linux. As is mine; fedora uses smp even on single processor machines these days. I think we could use more data here comparing
OSX Linux (single, smp)
Sorry, I forgot to add: tlon[~/Desktop]> uname -a Linux tlon 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux tlon[~/Desktop]> python -V Python 2.4.4c1 tlon[~/Desktop]> cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 35 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ This box runs up to date Ubuntu Edgy. Cheers, f
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/27/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hmmm, and your problem machine is running smp linux. As is mine; fedora uses smp even on single processor machines these days. I think we could use more data here comparing
OSX Linux (single, smp)
Sorry, I forgot to add:
tlon[~/Desktop]> uname -a Linux tlon 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux tlon[~/Desktop]> python -V Python 2.4.4c1 tlon[~/Desktop]> cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 35 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
This box runs up to date Ubuntu Edgy.
Yep, I am thinking a race condition in the linux smp kernel. Chuck
On Sat, Jan 27, 2007 at 04:11:59PM -0700, Fernando Perez wrote:
OK, this is weird. I modified the repeat code a little to ease collecting of results, and all of a sudden the differences went away. If you look at the attached code, here's what happens for me:
a) If I have line 77 like this (commented out):
#print '-'*75
I get:
[...] 94 z different 8.47032947254e-22 95 z different 8.47032947254e-22 96 z different 8.47032947254e-22 98 z different 8.47032947254e-22 99 z different 8.47032947254e-22
Numpy version: 1.0.2.dev3521
test1: 0 differences test2: 75 differences test3: 0 differences
b) If I remove the comment char from that line, I get:
tlon[~/Desktop]> python repeat.py
--------------------------------------------------------------------------- Numpy version: 1.0.2.dev3521
test1: 0 differences test2: 0 differences test3: 0 differences
That's it. One comment char removed, and something that's done /after/ the tests are actually executed.
That kind of 'I add a printf() call and the bug disappears' is unpleasantly reminiscent of lurking pointer errors in C code...
I also ran the test under Valgrind, but no errors pop up. Very strange. Cheers Stéfan
Fernando Perez wrote:
That's it. One comment char removed, and something that's done /after/ the tests are actually executed.
That kind of 'I add a printf() call and the bug disappears' is unpleasantly reminiscent of lurking pointer errors in C code...
Heh. Fantastic. It might be worthwhile porting this code to C to see what happens. If we can definitively point the finger at the kernel, that would be great (less work for me!). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 1/27/07, Robert Kern <robert.kern@gmail.com> wrote:
Fernando Perez wrote:
That's it. One comment char removed, and something that's done /after/ the tests are actually executed.
That kind of 'I add a printf() call and the bug disappears' is unpleasantly reminiscent of lurking pointer errors in C code...
Heh. Fantastic. It might be worthwhile porting this code to C to see what happens. If we can definitively point the finger at the kernel, that would be great (less work for me!).
It's definitely looking like something SMP related: on my laptop, with everything other than the hardware being identical (Linux distro, kernel, numpy build, etc), I can't make it fail no matter how I muck with it. I always get '0 differences'. The desktop is a dual-core AMD Athlon as indicated before, the laptop is an oldie Pentium III. They both run the same SMP-aware Ubuntu i686 kernel, since Ubuntu now ships a unified kernel, though obviously on the laptop the SMP code isn't active. Cheers, f
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
It's definitely looking like something SMP related: on my laptop, with everything other than the hardware being identical (Linux distro, kernel, numpy build, etc), I can't make it fail no matter how I muck with it. I always get '0 differences'.
The desktop is a dual-core AMD Athlon as indicated before, the laptop is an oldie Pentium III. They both run the same SMP-aware Ubuntu i686 kernel, since Ubuntu now ships a unified kernel, though obviously on the laptop the SMP code isn't active.
After installing a kernel that is not smp aware, I still have the same problem. --------------------------------------------------------------------------- Numpy version: 1.0.1 test1: 0 differences test2: 55 differences test3: 0 differences $ uname -a Linux kel 2.6.18-3-486 #1 Mon Dec 4 15:59:52 UTC 2006 i686 GNU/Linux $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 9 cpu MHz : 2793.143 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 5589.65
On 1/27/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
It's definitely looking like something SMP related: on my laptop, with everything other than the hardware being identical (Linux distro, kernel, numpy build, etc), I can't make it fail no matter how I muck with it. I always get '0 differences'.
The desktop is a dual-core AMD Athlon as indicated before, the laptop is an oldie Pentium III. They both run the same SMP-aware Ubuntu i686 kernel, since Ubuntu now ships a unified kernel, though obviously on the laptop the SMP code isn't active.
After installing a kernel that is not smp aware, I still have the same problem.
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness. So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
I was wondering if atlas-sse2 might be the problem, since my desktop is an sse2 machine, but my laptop uses only sse (old PentiumIII). Why don't you try putting in just atlas-sse and seeing what happens? Cheers, f
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
It's definitely looking like something SMP related: on my laptop, with everything other than the hardware being identical (Linux distro, kernel, numpy build, etc), I can't make it fail no matter how I muck with it. I always get '0 differences'.
The desktop is a dual-core AMD Athlon as indicated before, the laptop is an oldie Pentium III. They both run the same SMP-aware Ubuntu i686 kernel, since Ubuntu now ships a unified kernel, though obviously on the laptop the SMP code isn't active.
After installing a kernel that is not smp aware, I still have the same
On 1/27/07, Keith Goodman <kwgoodman@gmail.com> wrote: problem.
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
Interesting, I wonder if ATLAS is resetting the FPU flags and changing the rounding mode? It is just the LSB of the mantissa that looks to be changing. Before reporting the problem it might be good to pin it down a bit more if possible. How come your computation is so sensitive to these small effects? Chuck
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
Interesting, I wonder if ATLAS is resetting the FPU flags and changing the rounding mode? It is just the LSB of the mantissa that looks to be changing. Before reporting the problem it might be good to pin it down a bit more if possible.
Well, the fact that I don't see the problem on a PentiumIII (with atlas-sse) but I see it on my desktop (atlas-sse2) should tell us something. The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead. Intel FPUs use 80 bits internally for intermediate operations (even though they only return a normal 64-bit double result), so it's fairly common to see this kind of thing. You can test things by writing a little program in C that does the same operations, and use this little trick: #include <fpu_control.h> // Define DOUBLE to set the FPU in regular double-precision mode, disabling // the internal 80-bit mode which Intel FPUs have. //#define DOUBLE // ... later in the code's main(): // set FPU control word for double precision int cword = 4722; _FPU_SETCW(cword); This can show you if the problem is indeed caused by rounding differences between 64-bit and 80-bit mode. Cheers, f
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
Interesting, I wonder if ATLAS is resetting the FPU flags and changing the rounding mode? It is just the LSB of the mantissa that looks to be changing. Before reporting the problem it might be good to pin it down a bit more if possible.
Well, the fact that I don't see the problem on a PentiumIII (with atlas-sse) but I see it on my desktop (atlas-sse2) should tell us something. The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead. Intel FPUs use 80 bits internally for intermediate operations (even though they only return a normal 64-bit double result), so it's fairly common to see this kind of thing.
Removing ATLAS SSE2 does fix the problem. But why does test1 pass when ATLAS SSE2 is present?
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
The problem goes away if I remove atlas (atlas3-sse2 for me). But
just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
Interesting, I wonder if ATLAS is resetting the FPU flags and changing
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote: that the
rounding mode? It is just the LSB of the mantissa that looks to be changing. Before reporting the problem it might be good to pin it down a bit more if possible.
Well, the fact that I don't see the problem on a PentiumIII (with atlas-sse) but I see it on my desktop (atlas-sse2) should tell us something. The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead. Intel FPUs use 80 bits internally for intermediate operations (even though they only return a normal 64-bit double result), so it's fairly common to see this kind of thing.
Removing ATLAS SSE2 does fix the problem. But why does test1 pass when ATLAS SSE2 is present?
It is strange, isn't it. I'm still thinking race condition, but where? I suppose even python could be involved someplace. BTW, your algorithm sounds like a great way to expose small descrepancies. There's a great test for floating point errors lurking in there. Chuck
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
It is strange, isn't it. I'm still thinking race condition, but where? I suppose even python could be involved someplace.
BTW, your algorithm sounds like a great way to expose small descrepancies. There's a great test for floating point errors lurking in there.
Well, whoever can stop the butterfly from flapping its wings will be my hero.
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
So is this something to report to Clint Whaley? Or does it have to do with how numpy uses atlas?
Interesting, I wonder if ATLAS is resetting the FPU flags and changing the rounding mode? It is just the LSB of the mantissa that looks to be changing. Before reporting the problem it might be good to pin it down a bit more if possible.
Well, the fact that I don't see the problem on a PentiumIII (with atlas-sse) but I see it on my desktop (atlas-sse2) should tell us something. The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead. Intel FPUs use 80 bits internally for intermediate operations (even though they only return a normal 64-bit double result), so it's fairly common to see this kind of thing.
But how come it isn't consistent and seems to depend on timing? That is what makes me think there is a race somewhere in doing something, like setting flags . I googled yesterday for floating point errors and didn't find anything that looked relevant. Maybe I should try again with the combination of atlas and sse2. Chuck
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
But how come it isn't consistent and seems to depend on timing? That is what makes me think there is a race somewhere in doing something, like setting flags . I googled yesterday for floating point errors and didn't find anything that looked relevant. Maybe I should try again with the combination of atlas and sse2.
There could be more than one thing at work here, I honestly don't know. I'm just trying to throw familiar-sounding data bits at the wall, perhaps somebody will see a pattern in the blobs. It worked for Pollock :) Cheers, f
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
[snip] The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead.
So since I use N.float64, ATLAS SSE won't help me?
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
[snip] The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead.
So since I use N.float64, ATLAS SSE won't help me?
Well, the SSE part won't, but you're still better off with ATLAS than with the default reference BLAS implementation. I think even an ATLAS SSE has special code for double (not using any SSE-type engine) that's faster than the reference BLAS which is pure generic Fortran. Someone who knows the ATLAS internals please correct me if that's not the case. Cheers, f
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
[snip] The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead.
So since I use N.float64, ATLAS SSE won't help me?
Well, the SSE part won't, but you're still better off with ATLAS than with the default reference BLAS implementation. I think even an ATLAS SSE has special code for double (not using any SSE-type engine) that's faster than the reference BLAS which is pure generic Fortran. Someone who knows the ATLAS internals please correct me if that's not the case.
That makes sense. Unfortunately my simulation gives different results with and without ATLAS SSE even though the test script I made doesn't detect the difference.
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
[snip] The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead.
So since I use N.float64, ATLAS SSE won't help me?
Well, the SSE part won't, but you're still better off with ATLAS than with the default reference BLAS implementation. I think even an ATLAS SSE has special code for double (not using any SSE-type engine) that's faster than the reference BLAS which is pure generic Fortran. Someone who knows the ATLAS internals please correct me if that's not the case.
That makes sense.
Unfortunately my simulation gives different results with and without ATLAS SSE even though the test script I made doesn't detect the difference.
ATLAS BASE (no SSE or SSE2) also gives me different simulations results even though it passes the test script.
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/28/07, Fernando Perez <fperez.net@gmail.com> wrote:
[snip] The test code uses double arrays, and SSE2 has double precision support but it's purely 64-bit doubles. SSE is single-precision only, which means that for a double computation, ATLAS isn't used and the Intel FPU does the computation instead.
So since I use N.float64, ATLAS SSE won't help me?
Well, the SSE part won't, but you're still better off with ATLAS than with the default reference BLAS implementation. I think even an ATLAS SSE has special code for double (not using any SSE-type engine) that's faster than the reference BLAS which is pure generic Fortran. Someone who knows the ATLAS internals please correct me if that's not the case.
That makes sense.
Unfortunately my simulation gives different results with and without ATLAS SSE even though the test script I made doesn't detect the difference.
ATLAS BASE (no SSE or SSE2) also gives me different simulations results even though it passes the test script.
Hmmm, I wonder if stuff could be done in different orders. That could affect rounding. Even optimization settings could if someone wasn't careful to use parenthesis to force the order of evaluation. This is all very interesting. Chuck
On 1/28/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
How come your computation is so sensitive to these small effects?
I don't know. The differences I am seeing are larger than in the test script---but still of the order of eps. Each time step of my simulation selects a maximum value and then uses that for the next time step. Depending on which item is chosen of several items that are very close in value, the simulation can head in a new direction. I guess, as always, I need to randomly perturb my parameters and look at the distribution of test results to see if the effect I am trying to measure is significant. I had no idea it was this sensitive.
On 1/28/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/27/07, Fernando Perez <fperez.net@gmail.com> wrote:
It's definitely looking like something SMP related: on my laptop, with everything other than the hardware being identical (Linux distro, kernel, numpy build, etc), I can't make it fail no matter how I muck with it. I always get '0 differences'.
The desktop is a dual-core AMD Athlon as indicated before, the laptop is an oldie Pentium III. They both run the same SMP-aware Ubuntu i686 kernel, since Ubuntu now ships a unified kernel, though obviously on the laptop the SMP code isn't active.
After installing a kernel that is not smp aware, I still have the same
On 1/27/07, Keith Goodman <kwgoodman@gmail.com> wrote: problem.
The problem goes away if I remove atlas (atlas3-sse2 for me). But that just introduces another problem: slowness.
This problem may be related to this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294 Chuck
On 2/1/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
This problem may be related to this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294
It says it is fixed in libc6 2.3.5. I'm on 2.3.6. But do you think it is something similar? A port to Octave of the test script works fine on the same system.
Keith Goodman wrote:
A port to Octave of the test script works fine on the same system.
Are you sure that your Octave port uses ATLAS to do the matrix product? Could you post your port? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 2/1/07, Robert Kern <robert.kern@gmail.com> wrote:
Keith Goodman wrote:
A port to Octave of the test script works fine on the same system.
Are you sure that your Octave port uses ATLAS to do the matrix product? Could you post your port?
Here's the port. Yes, Octave uses atlas for matrix multiplication. Maybe the problem is a race condition and due to timing the outcome is always the same in Octave...
On 2/1/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 2/1/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
This problem may be related to this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294
It says it is fixed in libc6 2.3.5. I'm on 2.3.6. But do you think it is something similar?
I do, I am suspicious that the roundoff mode flag is changing state. But these sort of bugs are notoriously hard to track down. You did good isolating it to atlas and sse. Chuck
Hi, I am curious why I do not see any mention of the compilers and versions that were used in this thread. Having just finally managed to get SciPY installed from scratch (but not with atlas), I could see that using different compliers or versions or options especially compiling done at different times could be a factor. Bruce On 2/1/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
On 2/1/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 2/1/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
This problem may be related to this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294
It says it is fixed in libc6 2.3.5. I'm on 2.3.6. But do you think it is something similar?
I do, I am suspicious that the roundoff mode flag is changing state. But these sort of bugs are notoriously hard to track down. You did good isolating it to atlas and sse.
Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On 2/2/07, Bruce Southey <bsouthey@gmail.com> wrote:
I am curious why I do not see any mention of the compilers and versions that were used in this thread. Having just finally managed to get SciPY installed from scratch (but not with atlas), I could see that using different compliers or versions or options especially compiling done at different times could be a factor.
Yeah, good point. I installed 1.0.1 from binary from Debian sid. Maybe a chart of which configurations have the problem and which don't would help. If the problem is ATLAS I don't understand why test1 passes. Could the loading of the values be the problem and not the multiplication itself?
On Sat, Jan 27, 2007 at 04:00:33PM -0700, Charles R Harris wrote:
Hmmm, and your problem machine is running smp linux. As is mine; fedora uses smp even on single processor machines these days. I think we could use more data here comparing
It runs fine on this Ubuntu/Edgy machine, though: Linux genugtig 2.6.17-10-generic #2 SMP Tue Dec 5 21:16:35 UTC 2006 x86_64 GNU/Linux processor : 0 vendor_id : AuthenticAMD model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ processor : 1 vendor_id : AuthenticAMD model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm cmp_legacy Cheers Stéfan
Charles R Harris wrote:
Hmmm, and your problem machine is running smp linux. As is mine; fedora uses smp even on single processor machines these days. I think we could use more data here comparing
OSX
[today]$ python repeat.py --------------------------------------------------------------------------- Numpy version: 1.0.2.dev3520 test1: 0 differences test2: 0 differences test3: 0 differences [today]$ uname -a Darwin Sacrilege.local 8.8.2 Darwin Kernel Version 8.8.2: Thu Sep 28 20:43:26 PDT 2006; root:xnu-792.14.14.obj~1/RELEASE_I386 i386 i386 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Sat, Jan 27, 2007 at 03:11:58PM -0700, Charles R Harris wrote:
So this looks like an error in the LSB of the floating number. Could be rounding, could be something not reset quite right. I'm thinking possibly hardware at this time, maybe compiler.
Linux fedora 2.6.19-1.2895.fc6 #1 SMP Wed Jan 10 19:28:18 EST 2007 i686 athlon i386 GNU/Linux
processor : 0 vendor_id : AuthenticAMD
And just for the hell of it, with 4 CPUs :) Linux dirac 2.6.17-10-generic #2 SMP Tue Dec 5 21:16:35 UTC 2006 x86_64 GNU/Linux processor : 0 vendor_id : AuthenticAMD model name : Dual Core AMD Opteron(tm) Processor 275 processor : 1 vendor_id : AuthenticAMD model name : Dual Core AMD Opteron(tm) Processor 275 processor : 2 vendor_id : AuthenticAMD model name : Dual Core AMD Opteron(tm) Processor 275 processor : 3 vendor_id : AuthenticAMD model name : Dual Core AMD Opteron(tm) Processor 275 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm cmp_legacy Works fine. Cheers Stéfan
On 1/27/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Oddly, the relative error is always the same:
98 z different 2.0494565872e-16 99 z different 2.0494565872e-16
Which is nearly the same as the double precision 2.2204460492503131e-16, the difference being due to the fact that the precision is defined relative to 1, and the error in the computation are in a number relatively larger (more bits set, but not yet 2).
Fernando (Ubuntu) and I (Debian) get a different result (8.47032947254e-22) than you (Fedora) do (2.0494565872e-16).
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/27/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Oddly, the relative error is always the same:
98 z different 2.0494565872e-16 99 z different 2.0494565872e-16
Which is nearly the same as the double precision 2.2204460492503131e-16, the difference being due to the fact that the precision is defined relative to 1, and the error in the computation are in a number relatively larger (more bits set, but not yet 2).
Fernando (Ubuntu) and I (Debian) get a different result (8.47032947254e-22) than you (Fedora) do (2.0494565872e-16).
That's odd, the LSB bit of the double precision mantissa is only about 2.2e-16, so you can't *get* differences as small as 8.4e-22 without about 70 bit mantissa's. Hmmm, and extended double precision only has 63 bit mantissa's. Are you sure you are computing the error correctly? Chuck
On 1/29/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
That's odd, the LSB bit of the double precision mantissa is only about 2.2e-16, so you can't *get* differences as small as 8.4e-22 without about 70 bit mantissa's. Hmmm, and extended double precision only has 63 bit mantissa's. Are you sure you are computing the error correctly?
That is odd. 8.4e-22 is just the output of the test script: abs(z - z0).max(). That abs is from python.
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/29/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
That's odd, the LSB bit of the double precision mantissa is only about 2.2e-16, so you can't *get* differences as small as 8.4e-22 without about 70 bit mantissa's. Hmmm, and extended double precision only has 63 bit mantissa's. Are you sure you are computing the error correctly?
That is odd.
8.4e-22 is just the output of the test script: abs(z - z0).max(). That abs is from python.
By playing around with x and y I can get all sorts of values for abs(z - z0).max(). I can get down to the e-23 range and to 2.2e-16. I've also seen e-18 and e-22.
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/29/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
That's odd, the LSB bit of the double precision mantissa is only about 2.2e-16, so you can't *get* differences as small as 8.4e-22 without about 70 bit mantissa's. Hmmm, and extended double precision only has 63 bit mantissa's. Are you sure you are computing the error correctly?
That is odd.
8.4e-22 is just the output of the test script: abs(z - z0).max(). That abs is from python.
By playing around with x and y I can get all sorts of values for abs(z - z0).max(). I can get down to the e-23 range and to 2.2e-16. I've also seen e-18 and e-22.
Here is a setting for x and y that gives me a difference (using the unit test in this thread) of 4.54747e-13! That is huge---and a serious problem. I am sure I can get bigger. # x data x = M.zeros((3,3)) x[0,0] = 9.0030140479499 x[0,1] = 9.0026474226671 x[0,2] = -9.0011270502873 x[1,0] = 9.0228605377994 x[1,1] = 9.0033715311274 x[1,2] = -9.0082367491299 x[2,0] = 9.0044783987583 x[2,1] = 0.0027488028057 x[2,2] = -9.0036113393360 # y data y = M.zeros((3,1)) y[0,0] =10.00088539878978 y[1,0] = 0.00667193234012 y[2,0] = 0.00032472712345
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 1/29/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
That's odd, the LSB bit of the double precision mantissa is only about 2.2e-16, so you can't *get* differences as small as 8.4e-22 without about 70 bit mantissa's. Hmmm, and extended double precision only has 63 bit mantissa's. Are you sure you are computing the error correctly?
That is odd.
8.4e-22 is just the output of the test script: abs(z - z0).max(). That abs is from python.
By playing around with x and y I can get all sorts of values for abs(z - z0).max(). I can get down to the e-23 range and to 2.2e-16. I've also seen e-18 and e-22.
Here is a setting for x and y that gives me a difference (using the unit test in this thread) of 4.54747e-13! That is huge---and a serious problem. I am sure I can get bigger.
# x data x = M.zeros((3,3)) x[0,0] = 9.0030140479499 x[0,1] = 9.0026474226671 x[0,2] = -9.0011270502873 x[1,0] = 9.0228605377994 x[1,1] = 9.0033715311274 x[1,2] = -9.0082367491299 x[2,0] = 9.0044783987583 x[2,1] = 0.0027488028057 x[2,2] = -9.0036113393360
# y data y = M.zeros((3,1)) y[0,0] =10.00088539878978 y[1,0] = 0.00667193234012 y[2,0] = 0.00032472712345
OK. I guess I should be looking at the fractional difference instead of the absolute difference. The fractional difference is of order e-16.
Keith Goodman wrote: > On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote: >> On 1/29/07, Keith Goodman <kwgoodman@gmail.com> wrote: >>> On 1/29/07, Charles R Harris <charlesr.harris@gmail.com> wrote: >>> >>>> That's odd, the LSB bit of the double precision mantissa is only about >>>> 2.2e-16, so you can't *get* differences as small as 8.4e-22 without about >>>> 70 bit mantissa's. Hmmm, and extended double precision only has 63 bit >>>> mantissa's. Are you sure you are computing the error correctly? >>> That is odd. >>> >>> 8.4e-22 is just the output of the test script: abs(z - z0).max(). That >>> abs is from python. >> By playing around with x and y I can get all sorts of values for abs(z >> - z0).max(). I can get down to the e-23 range and to 2.2e-16. I've >> also seen e-18 and e-22. > > Here is a setting for x and y that gives me a difference (using the > unit test in this thread) of 4.54747e-13! That is huge---and a serious > problem. I am sure I can get bigger. Check the size of z0. Only the relative difference abs((z-z0)/z0) is going to be about 1e-16. If you adjust the size of z0, the absolute difference will also change in size. In the original unittest that you wrote, z0 is about 1e-6, so 1e-22 corresponds to 1 ULP. With the data you give here, z0 is about 1e3, so 1e-13 also corresponds to 1 ULP. There is no (additional) problem here. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 1/27/07, Keith Goodman <kwgoodman@gmail.com> wrote:
I get slightly different results when I repeat a calculation. In a long simulation the differences snowball and swamp the effects I am trying to measure.
Here's a unit test for the problem. I am distributing it in hopes of raising awareness of the problem. (What color should I make the Repeatability Wristbands?) I am sure others are having this problem without even knowing it.
On a PPC MacOS X box I don't see an error. If I append if __name__ == "__main__": run() to your test code and then run it I get: repeatability #1 ... ok repeatability #2 ... ok repeatability #3 ... ok ---------------------------------------------------------------------- Ran 3 tests in 0.156s OK
On 1/29/07, Russell E. Owen <rowen@cesmail.net> wrote:
On a PPC MacOS X box I don't see an error. If I append if __name__ == "__main__": run() to your test code and then run it I get:
repeatability #1 ... ok repeatability #2 ... ok repeatability #3 ... ok
---------------------------------------------------------------------- Ran 3 tests in 0.156s
OK
So far no one has duplicated the problem on windows or mac. The problem has only been seen on linux with atlas3-sse2. (I get a similar problem with other versions of atlas.) Are you running atlas on your PPC mac? Perhaps atlas3-altivec?
On a 64-bit Intel Core2 Duo running Debian unstable with atlas3 (there is no specific atlas3-sse2 for AMD64 Debian, although I think that it is included) everything checks out fine: eiger:~$ uname -a Linux eiger 2.6.18-3-amd64 #1 SMP Sun Dec 10 19:57:44 CET 2006 x86_64 GNU/Linux eiger:~$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz stepping : 6 cpu MHz : 2660.009 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 5324.65 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (same for 2nd CPU) Scott On Monday 29 January 2007 16:10, Keith Goodman wrote:
On 1/29/07, Russell E. Owen <rowen@cesmail.net> wrote:
On a PPC MacOS X box I don't see an error. If I append if __name__ == "__main__": run() to your test code and then run it I get:
repeatability #1 ... ok repeatability #2 ... ok repeatability #3 ... ok
------------------------------------------------------------------- --- Ran 3 tests in 0.156s
OK
So far no one has duplicated the problem on windows or mac. The problem has only been seen on linux with atlas3-sse2. (I get a similar problem with other versions of atlas.)
Are you running atlas on your PPC mac? Perhaps atlas3-altivec? _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
Keith Goodman wrote:
Here's a unit test for the problem. I am distributing it in hopes of raising awareness of the problem. (What color should I make the Repeatability Wristbands?)
I am sure others are having this problem without even knowing it.
Another datapoint using atlas3-base on Ubuntu AMD-64. Looking at the source package, I think it sets ISAEXT="sse2" for AMD-64 when building. rkern@rkernx2:~$ python repeat_test.py repeatability #1 ... ok repeatability #2 ... ok repeatability #3 ... ok ---------------------------------------------------------------------- Ran 3 tests in 0.043s OK rkern@rkernx2:~$ uname -a Linux rkernx2 2.6.17-10-generic #2 SMP Fri Oct 13 15:34:39 UTC 2006 x86_64 GNU/Linux rkern@rkernx2:~$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 43 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping : 1 cpu MHz : 2211.346 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm cmp_legacy bogomips : 4426.03 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 43 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping : 1 cpu MHz : 2211.346 cache size : 512 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm cmp_legacy bogomips : 4423.03 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 1/29/07, Robert Kern <robert.kern@gmail.com> wrote:
Keith Goodman wrote:
Here's a unit test for the problem. I am distributing it in hopes of raising awareness of the problem. (What color should I make the Repeatability Wristbands?)
I am sure others are having this problem without even knowing it.
Another datapoint using atlas3-base on Ubuntu AMD-64. Looking at the source package, I think it sets ISAEXT="sse2" for AMD-64 when building.
rkern@rkernx2:~$ python repeat_test.py repeatability #1 ... ok repeatability #2 ... ok repeatability #3 ... ok
---------------------------------------------------------------------- Ran 3 tests in 0.043s
OK
I ported the test to octave which like numpy uses Atlas. On my machine (debian etch atlas3-sse2) I get the problem in numpy but not in octave. Plus test1 always passes. So it is only when you reload x and y that the problem occurs. If you load x and y once (test1) and repeat the calculation, there is no problem. Do these two results point, however weakly, away from atlas?
On 1/27/07, Keith Goodman <kwgoodman@gmail.com> wrote:
I get slightly different results when I repeat a calculation. In a long simulation the differences snowball and swamp the effects I am trying to measure.
In the attach script there are three tests.
In test1, I construct matrices x and y and then repeatedly calculate z = calc(x,y). The result z is the same every time. So this test passes.
In test2, I construct matrices x and y each time before calculating z = calc(x,y). Sometimes z is slightly different---of the order of 1e-21 to 1e-18. But the x's test to be equal and so do the y's. This test fails. (It doesn't fail on my friend's windows box; I'm running linux.)
test3 is the same as test2 but I calculate z like this: z = calc(100*x ,y) / (100 * 100). This test passes.
What is going on?
Here is some sample output:
import repeat repeat.all() 4 z different 8.47032947254e-22 5 z different 8.47032947254e-22 7 z different 8.47032947254e-22 9 z different 8.47032947254e-22 10 z different 8.47032947254e-22 16 z different 8.47032947254e-22 24 z different 8.47032947254e-22 25 z different 8.47032947254e-22 26 z different 8.47032947254e-22 27 z different 8.47032947254e-22 30 z different 8.47032947254e-22 32 z different 8.47032947254e-22 34 z different 8.47032947254e-22 35 z different 8.47032947254e-22 36 z different 8.47032947254e-22 39 z different 8.47032947254e-22 40 z different 8.47032947254e-22 41 z different 8.47032947254e-22 45 z different 8.47032947254e-22 46 z different 8.47032947254e-22 50 z different 8.47032947254e-22 52 z different 8.47032947254e-22 53 z different 8.47032947254e-22 55 z different 8.47032947254e-22 56 z different 8.47032947254e-22 58 z different 8.47032947254e-22 66 z different 8.47032947254e-22 67 z different 8.47032947254e-22 71 z different 8.47032947254e-22 73 z different 8.47032947254e-22 74 z different 8.47032947254e-22 83 z different 8.47032947254e-22 86 z different 8.47032947254e-22 87 z different 8.47032947254e-22 88 z different 8.47032947254e-22 89 z different 8.47032947254e-22 90 z different 8.47032947254e-22 92 z different 8.47032947254e-22
test1: 0 differences test2: 38 differences test3: 0 differences
Repeated runs tend to give me the same number of differences in test2 for several runs. Then I get a new number of differences which last for severals runs...
I built a new computer: Core 2 Duo 32-bit Debian etch with numpy 1.0.2.dev3546. The repeatability test still fails. In order to make my calculations repeatable I'll have to remove ATLAS. That really slows things down. Does anyone with Debian not have this problem?
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
I built a new computer: Core 2 Duo 32-bit Debian etch with numpy 1.0.2.dev3546. The repeatability test still fails. In order to make my calculations repeatable I'll have to remove ATLAS. That really slows things down.
Hey, I have no problem with atlas-base and atlas-sse! On my old debian box all versions of atlas fail the repeatability test.
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
I built a new computer: Core 2 Duo 32-bit Debian etch with numpy 1.0.2.dev3546. The repeatability test still fails. In order to make my calculations repeatable I'll have to remove ATLAS. That really slows things down.
Hey, I have no problem with atlas-base and atlas-sse! On my old debian box all versions of atlas fail the repeatability test.
You mean on the Core 2 Dua 32-bit only atlas-sse2 causes troubles ? How does the speed compare atlas-sse2 vs. atlas-see (ignoring the repeatablity problem)? -Sebastian Haase
On 2/15/07, Sebastian Haase <haase@msg.ucsf.edu> wrote:
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
I built a new computer: Core 2 Duo 32-bit Debian etch with numpy 1.0.2.dev3546. The repeatability test still fails. In order to make my calculations repeatable I'll have to remove ATLAS. That really slows things down.
Hey, I have no problem with atlas-base and atlas-sse! On my old debian box all versions of atlas fail the repeatability test.
You mean on the Core 2 Dua 32-bit only atlas-sse2 causes troubles ? How does the speed compare atlas-sse2 vs. atlas-see (ignoring the repeatablity problem)?
Yes. On my old computer (P4) all three (atlas-base, -sse, -sse2) cause problems. On my new computer only sse2 causes a problem. I only want to know about the speed difference (sse, sse2) if the difference is small.
On 2/16/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 2/15/07, Sebastian Haase <haase@msg.ucsf.edu> wrote:
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
On 2/15/07, Keith Goodman <kwgoodman@gmail.com> wrote:
I built a new computer: Core 2 Duo 32-bit Debian etch with numpy 1.0.2.dev3546. The repeatability test still fails. In order to make my calculations repeatable I'll have to remove ATLAS. That really slows things down.
Hey, I have no problem with atlas-base and atlas-sse! On my old debian box all versions of atlas fail the repeatability test.
You mean on the Core 2 Dua 32-bit only atlas-sse2 causes troubles ? How does the speed compare atlas-sse2 vs. atlas-see (ignoring the repeatablity problem)?
Yes. On my old computer (P4) all three (atlas-base, -sse, -sse2) cause problems. On my new computer only sse2 causes a problem.
I only want to know about the speed difference (sse, sse2) if the difference is small.
I was just wondering what generally the speed improvement from sse to sse2 is ? Any tentative number would be fine... -S.
participants (11)
-
Bruce Southey -
Charles R Harris -
Fernando Perez -
Keith Goodman -
Robert Kern -
Russell E. Owen -
Scott Ransom -
Sebastian Haase -
Sebastian Haase -
Stefan -
Stefan van der Walt