A Friday 22 May 2009 11:42:56 Gregor Thalhammer escriguĂ©:

dmitrey schrieb:

hi all, has anyone already tried to compare using an ordinary numpy ufunc vs that one from corepy, first of all I mean the project http://socghop.appspot.com/student_project/show/google/gsoc2009/python/t1 24024628235

It would be interesting to know what is speedup for (eg) vec ** 0.5 or (if it's possible - it isn't pure ufunc) numpy.dot(Matrix, vec). Or any another example.

I have no experience with the mentioned CorePy, but recently I was playing around with accelerated ufuncs using Intels Math Kernel Library (MKL). These improvements are now part of the numexpr package http://code.google.com/p/numexpr/ Some remarks on possible speed improvements on recent Intel x86 processors. 1) basic arithmetic ufuncs (add, sub, mul, ...) in standard numpy are fast (SSE is used) and speed is limited by memory bandwidth. 2) the speed of many transcendental functions (exp, sin, cos, pow, ...) can be improved by _roughly_ a factor of five (single core) by using the MKL. Most of the improvements stem from using faster algorithms with a vectorized implementation. Note: the speed improvement depends on a _lot_ of other circumstances. 3) Improving performance by using multi cores is much more difficult. Only for sufficiently large (>1e5) arrays a significant speedup is possible. Where a speed gain is possible, the MKL uses several cores. Some experimentation showed that adding a few OpenMP constructs you could get a similar speedup with numpy. 4) numpy.dot uses optimized implementations.

Good points Gregor. However, I wouldn't say that improving performance by using multi cores is *that* difficult, but rather that multi cores can only be used efficiently *whenever* the memory bandwith is not a limitation. An example of this is the computation of transcendental functions, where, even using vectorized implementations, the computation speed is still CPU-bounded in many cases. And you have experimented yourself very good speed-ups for these cases with your implementation of numexpr/MKL :) Cheers, -- Francesc Alted