[Cython] cython.parallel tasks, single, master, critical, barriers

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Oct 12 10:36:16 CEST 2011

On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
> On Sun, Oct 9, 2011 at 5:57 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>> Hey,
>>>> So far people have been enthusiastic about the cython.parallel features,
>>>> I think we should introduce some new features.
> Excellent. I think this is going to become a killer feature like
> buffer support.
>>>> I propose the following,
>>> Great!!
>>> I only have time for a very short feedback now, perhaps more will follow.
>>>> assume parallel has been imported from cython:
>>>> with parallel.master():
>>>> this is executed in the master thread in a parallel (non-prange)
>>>> section
>>>> with parallel.single():
>>>> same as master, except any thread may do the execution
>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>> barrier at the end. The default is to wait.
>> I like
>> if parallel.is_master():
>>     ...
>> explicit_barrier_somehow() # see below
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> block.
> +1, the if statement feels a lot more natural.
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP....
>> I'm less sure about single, since making it a function indicates one could
>> use it in other contexts and the whole thing becomes too magic (since it's
>> tied to the position of invocation). I'm tempted to suggest
>> for _ in prange(1):
>>     ...
>> as our syntax for single.

Just to be clear: My point was that the above implements single 
behaviour even now, without any extra effort.

> The idea here is that you want a block of code executed once,
> presumably by the first thread that gets here? I think this could also
> be handled by a if statement, perhaps "if parallel.first()" or
> something like that. Is there anything special about this construct
> that couldn't simply be done by flushing/checking a variable?

Good point. I think there's a problem with OpenMP that it has too many 
primitives for similar things.

I'm -1 on single -- either using a for loop or flag+flush is more to 
type, but more readable to people who don't know cython.parallel (look: 
Python even makes "self." explicit -- the bias in language design is 
clearly on readability rather than writability).

I thought of "if is_first()" as well, but my problem is again that it 
binds to the location of the call.

if foo:
     if parallel.is_first():
     if parallel.is_first():

can not be refactored to:

if parallel.is_first():
     if foo:

which I think is highly confusing for people who didn't write the code 
and don't know the details of cython.parallel. (Unlike is_master(), 
which works the same either way).

I think we should aim for something that's as easy to read as possible 
for Python users with no cython.parallel knowledge.

>>>> with parallel.task():
>>>> create a task to be executed by some thread in the team
>>>> once a thread takes up the task it shall only be executed by that
>>>> thread and no other thread (so the task will be tied to the thread)
>>>> C variables will be firstprivate
>>>> Python objects will be shared
>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>> Closures are excellent for the notion of a task, so I think something
>>> based on the futures API would work better. I realize that makes the
>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>> it is worth it in the long run.
> It's almost as if you're reading my thoughts. There are much more
> natural task APIs, e.g. futures or the way the Python
> threading/multiprocessing does things.
>>>> with parallel.critical():
>>>> this section of code is mutually exclusive with other critical sections
>>>> optional keyword argument 'name' specifies a name for the critical
>>>> section,
>>>> which means all sections with that name will exclude each other,
>>>> but not
>>>> critical sections with different names
>>>> Note: all threads that encounter the section will execute it, just
>>>> not at the same time
>> Yes, this works well as a with-statement...
>> ..except that it is slightly magic in that it binds to call position (unlike
>> anything in Python). I.e. this would be more "correct", or at least
>> Pythonic:
>> with parallel.critical(__file__, __line__):
>>     ...

Mark: I stand corrected on this point. +1 on your critical proposal.

> This feels a lot like a lock, which of course fits well with the with
> statement.
>>>> with parallel.barrier():
>>>> all threads wait until everyone has reached the barrier
>>>> either no one or everyone should encounter the barrier
>>>> shared variables are flushed
>> I have problems with requiring a noop with block...
>> I'd much rather write
>> parallel.barrier()
>> However, that ties a function call to the place of invocation, and suggests
>> that one could do
>> if rand()>  .5:
>>     barrier()
>> else:
>>     i += 3
>>     barrier()
>> and have the same barrier in each case. Again,
>> barrier(__file__, __line__)
>> gets us purity at the cost of practicality. Another way is the pthreads
>> approach (although one may have to use pthread rather then OpenMP to get it,
>> unless there are named barriers?):
>> barrier_a = parallel.barrier()
>> barrier_b = parallel.barrier()
>> with parallel:
>>     barrier_a.wait()
>>     if rand()>  .5:
>>         barrier_b.wait()
>>     else:
>>         i += 3
>>         barrier_b.wait()
>> I'm really not sure here.
> I agree, the barrier doesn't seem like it belongs in a context. For
> example, it's ambiguous whether the block is supposed to proceed or
> succeed the barrier. I like the named barrier idea, but if that's not
> feasible we could perhaps use control flow to disallow conditionally
> calling barriers (or that every path calls the barrier (an equal
> number of times?)).

It is always an option to go beyond OpenMP. Pthread barriers are a lot 
more powerful in this way, and with pthread and Windows covered I think 
we should be good...

IIUC, you can't have different path calling the barrier the same number 
of times, it's merely

#pragma omp barrier

and a seperate barrier statement gets another counter. Which is why I 
think it is not powerful enough and we should use pthreads.

> +1. I like the idea of providing more parallelism constructs, but
> rather than risk fixating on OpenMP's model, perhaps we should look at
> the problem we're trying to solve (e.g., what can't one do well now)
> and create (or more likely borrow) the right Pythonic API to do it.

Also, quick and flexible message-passing between threads/processes 
through channels is becoming an increasingly popular concept. Go even 
has a seperate syntax for channel communication, and zeromq is becoming 
popular for distributed work.

The is a problem Cython may need to solve here, since one currently has 
to use very low-level C to do it quickly (either zeromq or pthreads in 
most cases -- I guess, an OpenMP critical section would help in 
implementing a queue though).

I wouldn't resist a builtin "channel" type in Cython (since we don't 
have full templating/generics, it would be the only way of sending typed 
data conveniently?).

I ultimately feel things like that is more important than 100% coverage 
of the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.

Dag Sverre

