I just wanted to draw the attention of NumPy devs to Mark Florisson's GSoC work.
It is 'minivect', a tool to use for compiling array expressions (think (as a concept) a shared backend between Cython, Theano, numba, though it's only used in Cython currently).
His M. Sc. thesis, "Techniques for Static and Dynamic Compilation of Array Expressions", is up here:
As you can see he even beats Intel Fortran for some array layouts, and in general have comparable performance with it. The benchmarks are mostly for two-operand operations, i.e. operations where NumPy semantics would be OK.
IMO, if anybody ever wants to revamp NumPy's computation abilities and get that 2-3x speedup (e.g., make it multi-threaded), this is a very good place to start.