Numpy uses the C library version. If long double and float aren't available the double version is used with number conversions, but that shouldn't give a factor of 100x. Something else is going on.

>> I'm the student doing the project. I have a blog here, which contains

>> some initial performance numbers for a couple test ufuncs I did:

>> http://numcorepy.blogspot.com

>> Another alternative we've talked about, and I (more and more likely) mayAgreed -- our concern when considering for the project was to keep the

>> look into is composing multiple operations together into a single ufunc.

>> Again the main idea being that memory accesses can be reduced/eliminated.

> IMHO, composing multiple operations together is the most promising venue for

> leveraging current multicore systems.

scope reasonable so I can complete it in the GSoC timeframe. If I have

time I'll definitely be looking into this over the summer; if not later.

I've seen that page before. Using another source [1] I came up with a

> Another interesting approach is to implement costly operations (from the point

> of view of CPU resources), namely, transcendental functions like sin, cos or

> tan, but also others like sqrt or pow) in a parallel way. If besides, you can

> combine this with vectorized versions of them (by using the well spread SSE2

> instruction set, see [1] for an example), then you would be able to achieve

> really good results for sure (at least Intel did with its VML library ;)

> [1] http://gruntthepeon.free.fr/ssemath/

quick/dirty cos ufunc. Performance is crazy good compared to NumPy

(100x); see the latest post on my blog for a little more info. I'll

look at the source myself when I get time again, but is NumPy using a

Python-based cos function, a C implementation, or something else? As I

wrote in my blog, the performance gain is almost too good to believe.

