The issue with OpenCL is that there will be some extensions for each supported architecture, which means that the generic OpenCL will never be very fast or more exactly near the optimum.
what's the difference w/ OpenGL ? i.e. isn't the job of the "underlying" library to provide the best algorithm- freakingly-optimized-bare-to-the-metal-whatever-opcode, hidden away from the user's face ?
It's like OpenGL: you have to fall back to more simple functions if you want to support every platform. If you target only one specific platform, you can use custom optimized functions.
OpenCL is just an API (modeled after the CUDA one AFAICT) so implementers can use whatever trick they want, right ?
Implementers can't know for instance how the data-domain must be split (1D, 2D, 3D, ... ? what if the underlying tool doesn't provide all of them?). OpenCL will have ways to tell that some data must be stored in the local or shared memory (for the GPU), ... There are some companies that provide ways to do this with pragmas ion C and Fortran (i.e. CAPS), but even if there are pragmas dedicated to CUDA, the generated code is not optimal. So I don't think it is reasonable to expect the implementers to provide in the common API the tools to make a really optimal code. You will have to use additional, manufacturer-related API, like what you do for state-of-the-art OpenGL.
my 2 euro-cents.
my 2 euro-cents ;) Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher