Re: [Numpy-discussion] LAPACK/BLAS for Fedora core 4 i386 ?
Jochen Küpper wrote:
"Chris Barker" <Chris.Barker@noaa.gov> writes:
Do you mean this? : yes
OK, so I've done what I can.
To find out you might have to look at the LAPACK functions you are using and compare them against the ones provided by ATLAS.
Well, I'm testing using LinearAlgebra.solve_linear_equations() from Numeric (and numarray). The reason I'm wondering is that on my last machine, a 1ghz PIII laptop running Gentoo, using the Gentoo provided atlas/lapack, I got a 6-7 times speed up over lapack_lite. On my new machine, a 2GHz Pentium M laptop running FC4, I got about a 2 times speed-up, which is nice, but not nearly as impressive. Another possible issue is that I used pre-complied binaries form the atlas site, which are a bit old, maybe I should compile myself. Any thoughts? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Friday 29 July 2005 12:22 pm, Chris Barker wrote:
To find out you might have to look at the LAPACK functions you are using and compare them against the ones provided by ATLAS.
Well, I'm testing using LinearAlgebra.solve_linear_equations() from Numeric (and numarray). The reason I'm wondering is that on my last machine, a 1ghz PIII laptop running Gentoo, using the Gentoo provided atlas/lapack, I got a 6-7 times speed up over lapack_lite.
On my new machine, a 2GHz Pentium M laptop running FC4, I got about a 2 times speed-up, which is nice, but not nearly as impressive.
Another possible issue is that I used pre-complied binaries form the atlas site, which are a bit old, maybe I should compile myself. Any thoughts?
Maybe I don't understanding something, but isn't the point of ATLAS that the libraries are tuned at compile time for your specific setup? -- Darren
Darren Dale wrote:
Maybe I don't understanding something, but isn't the point of ATLAS that the libraries are tuned at compile time for your specific setup?
yes, but the binaries I downloaded are for the P4 processor, so I'm not sure how much more specific I can get. I'm going to give it a try. This does bring up a point, however. Why is lapack usually linked statically? I usually link statically so that I can move applications to a different system without worrying about what libraries are installed there. However, with lapack/blas, you want to use a library compiled for that particular machine, so it seems to make more sense to use dynamic ones. Besides, it would be easier to compare performance if all I had to do was drop a new *.so in place. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi,
yes, but the binaries I downloaded are for the P4 processor, so I'm not sure how much more specific I can get. I'm going to give it a try.
There is some tuning that is specific to the cache, which might change things a bit. Is it possible your test code uses NaNs at some point? That will result in a huge slowdown on the P4 with the default ATLAS binaries: http://www.mrc-cbu.cam.ac.uk/Imaging/Common/spm_intel_tune.shtml#nan_problem Matthew
Chris Barker wrote:
Darren Dale wrote:
Maybe I don't understanding something, but isn't the point of ATLAS that the libraries are tuned at compile time for your specific setup?
yes, but the binaries I downloaded are for the P4 processor, so I'm not sure how much more specific I can get. I'm going to give it a try.
Unfortunately, you have to get very specific to get the most out of Atlas. The two things that come to mind are: - cache size which differs across various Pentium 4s/Xeons - clock speed The former means that you have to block operations on your matrices in the right way so you keep data in each level of memory hierarchy as long as possible. Clock speed matters because it introduces varying memory latencies so, for example, you might not be prefetching data at the right rate. That's why I usually compile Atlas without any builtin presets even though it takes very long. However, I also try the prebuilt binaries to make sure that Clint (Atlas' author) didn't turn some secret knob that I don't know about. Piotr
participants (4)
-
Chris Barker
-
Darren Dale
-
Matthew Brett
-
Piotr Luszczek