A Wednesday 21 October 2009 14:27:46 David Cournapeau escrigué:
This is because numpy is a package that works mainly with arrays in an element-wise way, and in this scenario, the time to transmit data to CPU dominates, by and large, over the time to perform operations.
Is it general, or just for simple operations in numpy and ufunc ? I remember that for music softwares, SIMD used to matter a lot, even for simple bus mixing (which is basically a ax+by with a, b scalars and x y the input arrays).
This is general, as long as the dataset has to be brought from memory to CPU, and operations to be done are element-wise and simple (i.e. not transcendental). SIMD does matter in general when the dataset: 1) is already in cache 2) you have to perform costly operations (mainly transcendental) 3) a combination of the above I don't know the case for music software, but if you say that ax+by are accelerated by SIMD, I'd say that case 1) is happening.
Do you have any interest in adding SIMD to some core numpy (transcendental functions). If so, I would try to go back to the problem of runtime SSE detection and loading of optimized shared library in a cross-platform way - that's something which should be done at some point in numpy, and people requiring it would be a good incentive.
I don't personally have a lot of interest implementing this for numpy. But in case anyone does, I find the next library: http://gruntthepeon.free.fr/ssemath/ very interesting. Perhaps there could be other (free) implementations... -- Francesc Alted