[Numpy-discussion] CUDA

Tue May 26 08:29:51 EDT 2009

>> The issue with OpenCL is that there will be some extensions for each
>> supported architecture, which means that the generic OpenCL will never
>> be very fast or more exactly near the optimum.
>
> what's the difference w/ OpenGL ?
> i.e. isn't the job of the "underlying" library to provide the best algorithm-
> freakingly-optimized-bare-to-the-metal-whatever-opcode, hidden away from the
> user's face ?

It's like OpenGL: you have to fall back to more simple functions if
you want to support every platform. If you target only one specific
platform, you can use custom optimized functions.

> OpenCL is just an API (modeled after the CUDA one AFAICT) so implementers can
> use whatever trick they want, right ?

Implementers can't know for instance how the data-domain must be split
(1D, 2D, 3D, ... ? what if the underlying tool doesn't provide all of
them?). OpenCL will have ways to tell that some data must be stored in
the local or shared memory (for the GPU), ... There are some companies
that provide ways to do this with pragmas ion C and Fortran (i.e.
CAPS), but even if there are pragmas dedicated to CUDA, the generated
code is not optimal. So I don't think it is reasonable to expect the
implementers to provide in the common API the tools to make a really
optimal code. You will have to use additional, manufacturer-related
API, like what you do for state-of-the-art OpenGL.

> my 2 euro-cents.

my 2 euro-cents ;)

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher