[Cython] OpenMP support

Robert Bradshaw robertwb at math.washington.edu
Fri Mar 11 01:46:31 CET 2011


On Tue, Mar 8, 2011 at 11:16 AM, Francesc Alted <faltet at pytables.org> wrote:
> A Tuesday 08 March 2011 18:50:15 Stefan Behnel escrigué:
>> mark florisson, 08.03.2011 18:00:
>> > What I meant was that the
>> > wrapper returned by the decorator would have to call the closure
>> > for every iteration, which introduces function call overhead.
>> >
>> >[...]
>> >
>> > I guess we just have to establish what we want to do: do we
>> > want to support code with Python objects (and exceptions etc), or
>> > just C code written in Cython?
>>
>> I like the approach that Sturla mentioned: using closures to
>> implement worker threads. I think that's very pythonic. You could do
>> something like this, for example:
>>
>>      def worker():
>>          for item in queue:
>>              with nogil:
>>                  do_stuff(item)
>>
>>      queue.extend(work_items)
>>      start_threads(worker, count)
>>
>> Note that the queue is only needed to tell the thread what to work
>> on. A lot of things can be shared over the closure. So the queue may
>> not even be required in many cases.
>
> I like this approach too.  I suppose that you will need to annotate the
> items so that they are not Python objects, no?  Something like:
>
>     def worker():
>         cdef int item  # tell that item is not a Python object!
>         for item in queue:
>             with nogil:
>                 do_stuff(item)
>
>     queue.extend(work_items)
>     start_threads(worker, count)

On a slightly higher level, are we just trying to use OpenMP from
Cython, or are we trying to build it into the language? If the former,
it may make sense to stick closer than one might otherwise be tempted
in terms of API to the underlying C to leverage the existing
documentation. A library with a more Pythonic interface could perhaps
be written on top of that. Alternatively, if we're building it into
Cython itself, I'd it might be worth modeling it after the
multiprocessing module (though I understand it would be implemented
with threads), which I think is a decent enough model for managing
embarrassingly parallel operations. The above code is similar to that,
though I'd prefer the for loop implicit rather than as part of the
worker method (or at least as an argument). If we went this route,
what are the advantages of using OpenMP over, say, pthreads in the
background? (And could the latter be done with just a library + some
fancy GIL specifications?) One thing that's nice about OpenMP as
implemented in C is that the serial code looks almost exactly like the
parallel code; the code at http://wiki.cython.org/enhancements/openmp
has this property too.

Also, I like the idea of being able to hold the GIL by the invoking
thread and having the "sharing" threads do the appropriate locking
among themselves when needed if possible, e.g. for exception raising.

Another thought I had is, there might be other usecases for being able
to emit generic pragmas statements, how far would that get us?

- Robert


More information about the cython-devel mailing list