[Cython] OpenCL support

Mon Feb 6 00:12:47 CET 2012

On 5 February 2012 22:39, Dimitri Tcaciuc <dtcaciuc at gmail.com> wrote:
> Mark,
>
> Couple of thoughts based on some experience with OpenCL...
>
> 1. This may be going outside the proposed purpose, but some algorithms
> such as molecular simulations can benefit from a fairly large amount
> of constant data loaded at the beginning of the program and persisted
> in between invocations of a function. If I understand the proposal,
> entire program would need to be within one `with` block, which would
> certainly be limiting to the architecture. Eg.
>
>     # run.py
>     from cython_module import Evaluator
>
>     # Arrays are loaded into device memory here
>     x = Evaluator(params...)
>     for i in range(N):
>         # Calculations are performed with
>         # mostly data in the device memory
>         data_i = x.step()
>         ...

The point of the proposal is that the slices will actually stay on the
GPU as long as possible, until they absolutely need to be copied back
(e.g. when you go back to NumPy-land). You can do anything you want
in-between (outside any parallel section), e.g. call other functions
that run on the CPU, call python functions, whatever. When you
continue processing data that is still on the GPU, it will simply
continue from there.

But point taken, the compiler could think "oh but this is not too much
work, and I have more data in main memory than on the GPU, so let me
use the CPU and copy that constant data back". So perhaps the pinning
should not just work for main memory, stuff could also be pinned on
the GPU. Then if there is a "pinning conflict" Cython would raise an
exception.

> 2. AFAIK, given a device, OpenCL basically takes it over (which would
> be eg. 8 cores on 2 CPU x 4 cores machine), so I'm not sure how
> `num_cores` parameter would work here. There's the fission extension
> that allows you to selectively run on a portion of the device, but the
> idea is that you're still dedicating entire device to your process,
> but merely giving more organization to your processing tasks, where
> you have to specify the core numbers you want to use. I may very well
> be wrong here, bashing is welcome :)

Oh, yes. I think the num_threads clause could simply be ignored in
that context, it's only supposed to be an upper limit. Scheduling
hints like chunksize could also be ignored :)

> 3. Does it make sense to make OpenCL more explicit? Heuristics and
> automatic switching between, say, CPU and GPU is great for eg. Sage
> users, but maybe not so much if you know exactly what you're doing
> with your machine resources. E.g just having a library with thin
> cython-adapted wrappers would be awesome. I imagine this can be
> augmented by arrays having a knowledge of device-side/client-side
> (which would go towards addressing the issue 1. above)

Hm, there are several advantages to supporting this in the language.
One is that you can support parallel sections, and that your code can
transparently execute in parallel on whatever device the compiler and
runtime think will be best. Excluding the cython.parallel stuff I
don't think there is enough room for a library, you might as well use
pyopencl directly in that case right?
Not OpenCL perse, but part of that will also solve the numpy-temporary
problem, which we have numexpr for. But it would be more convenient to
express oneself natively in the programming language of choice (Cython
:).

> Cheers,
>
>
> Dimitri.
>
> On Sun, Feb 5, 2012 at 1:57 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> Hey,
>>
>> I created a CEP for opencl support: http://wiki.cython.org/enhancements/opencl
>> What do you think?
>>
>> Mark
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel