On Dec 10, 2007 11:04 PM, David Cournapeau <cournape@gmail.com> wrote:
On Dec 11, 2007 12:46 PM, Andrew Straw <strawman@astraw.com> wrote:
According to the QEMU website, QEMU does not (yet) emulate SSE on x86 target, so a Windows installation on a QEMU virtual machine may be a good way to build binaries free of these issues. http://fabrice.bellard.free.fr/qemu/qemu-tech.html I tried this, this does not work (it actually emulates SSE). I went further, and managed to disable SSE support in qemu...
But again, what's the point: it takes ages to compile (qemu without the hardware accelerator is slow, like ten times slower), and you will end up with a really bad atlas, since atlas optimizaton is entirely based on runtime timers, which do not make sense anymore.
I mean, really, what's the point of doing all this compared to using blas/lapack from netlib ? In practice, is it really slower ? For what ? I know I don't care so much, and I am a heavy user of numpy.
For certain cases the difference can be pretty dramatic, but I think there's a simple, reasonable solution that is likely to work: ship TWO binaries of Numpy/Scipy each time: 1. {numpy,scipy}-reference: built with the reference blas from netlib, no atlas, period. 2. {}-atlas: built with whatever the developers have at the time, which will likely mean these days a core 2 duo with SSE2 support. What hardware it was built on should be indicated, so people can at least know this fact. Just indicate that: - The atlas version is likely faster, but fully unsupported and likely to crash older platforms, no refunds. - If you *really* care about performance, you should build Atlas yourself or be 100% sure that you're using an Atlas built on the same chip you're using, so the build-time timing and blocking choices are actually meaningful. That sounds like a reasonable bit of extra work (a lot easier than building a run-time dynamic atlas) with a true payoff in terms of stability. No? Cheers, f