[Cython] cython.parallel tasks, single, master, critical, barriers

Fri Oct 14 20:31:16 CEST 2011

On Wed, Oct 12, 2011 at 7:55 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
>>> I ultimately feel things like that is more important than 100% coverage of
>>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>>
>> +1 Prange handles the (corse-grained) SIMD case nicely, and a
>> task/futures model based on closures would I think flesh this out to
>> the next level of generality (and complexity).
>
> Futures are definitely nice. I suppose I think really like "inline
> futures", i.e. openmp tasks. I realize that futures may look more
> pythonic. However, as mentioned previously, I also see issues with
> that. When you submit a task then you expect a future object, which
> you might want to pass around. But we don't have the GIL for that. I
> personally feel that futures is something that should be done by a
> library (such as concurrent.futures in python 3.2), and inline tasks
> by a language. It also means I have to write an entire function or
> closure for perhaps only a few lines of code.
>
> I might also want to submit other functions that are not closures, or
> I might want to reuse my closures that are used for tasks and for
> something else. So what if my tasks contain more parallel constructs?
> e.g. what if I have a task closure that I return from my function that
> generates more tasks itself? Would you just execute them sequentially
> outside of the parallel construct, or would you simply disallow that?
> Also, do you restrict future "objects" to only the parallel section?
>
> Another problem is that you can only wait on tasks of your direct
> children. So what if I get access to my parent's future object
> (assuming you allow tasks to generate tasks), and then want the result
> of my parent?
> Or what if I store these future objects in an array or list and access
> them arbitrarily? You will only know at runtime which task to wait on,
> and openmp only has a static, lexical taskwait.
>
> I suppose my point is that without either a drastic rewrite (e.g., use
> pthreads instead of openmp) or quite a bit of contraints, I am unsure
> how futures would work here. Perhaps you guys have some concrete
> syntax and semantics proposals?

It feels to me that OpenMP tasks took a different model of parallelism
and tried to force them into the OpenMP model/constraints, and so it'd
be even more difficult to fit them into a nice pythonic interface.
Perhaps to make progress on this front we need to have a concrete
example to look at. I'm also wondering if the standard threading
module (perhaps with overlay support) used with nogil functions would
be sufficient--locking is required for handling the queues, etc. so
the fact that the GIL is involved is not a big deal. It is possible
that this won't scale to as small of work units, but the overhead
should be minimal once your work unit is a sufficient size (which is
probably quite small) and it's already implemented and well
documented/used.

As for critical and barrier, the notion of a critical block as a with
statement is very useful. Creating/naming locks (rather than being
implicit on the file/line number) is more powerful, but is a larger
burden on the user and more difficult to support with the OpenMP
backend. barrier, if supported, should be a function call not a
context. Not as critical as with the tasks case, but a good example to
see how it flows would be useful here as well.

As for single, I see doing this manually does require boilerplate
locking, so what about

if cython.parallel.once():  # will return True once for a tread group.
    ...

we could implement this via our own locking/checking/flushing to allow
it to occur in arbitrary expressions, e.g.

special_worker = cython.parallel.once()
if special_worker:
   ...
[common code]
if special_worker:   # single wouldn't work here
   ...

- Robert