For some reason the list seems to occasionally drop my messages...
Francesc Alted wrote:
> A Friday 22 May 2009 13:52:46 Andrew Friedley escrigué:
>> I'm the student doing the project. I have a blog here, which contains
>> some initial performance numbers for a couple test ufuncs I did:
>>
>>
http://numcorepy.blogspot.com
>> Another alternative we've talked about, and I (more and more likely) may
>> look into is composing multiple operations together into a single ufunc.
>> Again the main idea being that memory accesses can be reduced/eliminated.
>
> IMHO, composing multiple operations together is the most promising venue for
> leveraging current multicore systems.
Agreed -- our concern when considering for the project was to keep the
scope reasonable so I can complete it in the GSoC timeframe. If I have
time I'll definitely be looking into this over the summer; if not later.
> Another interesting approach is to implement costly operations (from the point
> of view of CPU resources), namely, transcendental functions like sin, cos or
> tan, but also others like sqrt or pow) in a parallel way. If besides, you can
> combine this with vectorized versions of them (by using the well spread SSE2
> instruction set, see [1] for an example), then you would be able to achieve
> really good results for sure (at least Intel did with its VML library ;)
>
> [1]
http://gruntthepeon.free.fr/ssemath/
I've seen that page before. Using another source [1] I came up with a
quick/dirty cos ufunc. Performance is crazy good compared to NumPy
(100x); see the latest post on my blog for a little more info. I'll
look at the source myself when I get time again, but is NumPy using a
Python-based cos function, a C implementation, or something else? As I
wrote in my blog, the performance gain is almost too good to believe.