For some reason the list seems to occasionally drop my messages... Francesc Alted wrote:

A Friday 22 May 2009 13:52:46 Andrew Friedley escriguĂ©:

I'm the student doing the project. I have a blog here, which contains some initial performance numbers for a couple test ufuncs I did:

Another alternative we've talked about, and I (more and more likely) may look into is composing multiple operations together into a single ufunc. Again the main idea being that memory accesses can be reduced/eliminated.

IMHO, composing multiple operations together is the most promising venue for leveraging current multicore systems.

Agreed -- our concern when considering for the project was to keep the scope reasonable so I can complete it in the GSoC timeframe. If I have time I'll definitely be looking into this over the summer; if not later.

Another interesting approach is to implement costly operations (from the point of view of CPU resources), namely, transcendental functions like sin, cos or tan, but also others like sqrt or pow) in a parallel way. If besides, you can combine this with vectorized versions of them (by using the well spread SSE2 instruction set, see [1] for an example), then you would be able to achieve really good results for sure (at least Intel did with its VML library ;)

I've seen that page before. Using another source [1] I came up with a quick/dirty cos ufunc. Performance is crazy good compared to NumPy (100x); see the latest post on my blog for a little more info. I'll look at the source myself when I get time again, but is NumPy using a Python-based cos function, a C implementation, or something else? As I wrote in my blog, the performance gain is almost too good to believe. [1] http://www.devmaster.net/forums/showthread.php?t=5784 Andrew