Chris Barker wrote:
Darren Dale wrote:
Maybe I don't understanding something, but isn't the point of ATLAS that the libraries are tuned at compile time for your specific setup?
yes, but the binaries I downloaded are for the P4 processor, so I'm not sure how much more specific I can get. I'm going to give it a try.
Unfortunately, you have to get very specific to get the most out of Atlas. The two things that come to mind are: - cache size which differs across various Pentium 4s/Xeons - clock speed The former means that you have to block operations on your matrices in the right way so you keep data in each level of memory hierarchy as long as possible. Clock speed matters because it introduces varying memory latencies so, for example, you might not be prefetching data at the right rate. That's why I usually compile Atlas without any builtin presets even though it takes very long. However, I also try the prebuilt binaries to make sure that Clint (Atlas' author) didn't turn some secret knob that I don't know about. Piotr