...
I once wrote a module that replaces the built in transcendental functions of numpy by optimized versions from Intels vector math library. If someone is interested, I can publish it. In my experience it was of little use since real world problems are limited by memory bandwidth. Therefore extending numexpr with optimized transcendental functions was the better solution. Afterwards I discovered that I could have saved the effort of the first approach since gcc is able to use optimized functions from Intels vector math library or AMD's math core library, see the doc's of -mveclibabi. You just need to recompile numpy with proper compiler arguments.
I'm interested. I'd like to try AMD rather than intel, because AMD is easier to obtain. I'm running on intel machine, I hope that doesn't matter too much. What exactly do I need to do? I see that numpy/site.cfg has an MKL section. I'm assuming I should not touch that, but just mess with gcc flags?