On Mon, May 25, 2009 at 4:59 AM, Andrew Friedley <afriedle@indiana.edu>wrote:
For some reason the list seems to occasionally drop my messages...
Francesc Alted wrote:
A Friday 22 May 2009 13:52:46 Andrew Friedley escrigué:
I'm the student doing the project. I have a blog here, which contains some initial performance numbers for a couple test ufuncs I did:
Another alternative we've talked about, and I (more and more likely) may look into is composing multiple operations together into a single ufunc. Again the main idea being that memory accesses can be reduced/eliminated.
IMHO, composing multiple operations together is the most promising venue for leveraging current multicore systems.
Agreed -- our concern when considering for the project was to keep the scope reasonable so I can complete it in the GSoC timeframe. If I have time I'll definitely be looking into this over the summer; if not later.
Another interesting approach is to implement costly operations (from the point of view of CPU resources), namely, transcendental functions like sin, cos or tan, but also others like sqrt or pow) in a parallel way. If besides, you can combine this with vectorized versions of them (by using the well spread SSE2 instruction set, see [1] for an example), then you would be able to achieve really good results for sure (at least Intel did with its VML library ;)
I've seen that page before. Using another source [1] I came up with a quick/dirty cos ufunc. Performance is crazy good compared to NumPy (100x); see the latest post on my blog for a little more info. I'll look at the source myself when I get time again, but is NumPy using a Python-based cos function, a C implementation, or something else? As I wrote in my blog, the performance gain is almost too good to believe.
Numpy uses the C library version. If long double and float aren't available the double version is used with number conversions, but that shouldn't give a factor of 100x. Something else is going on. Chuck