[Cython] cython.parallel tasks, single, master, critical, barriers

Wed Oct 12 16:55:44 CEST 2011

On 12 October 2011 10:08, Robert Bradshaw <robertwb at math.washington.edu> wrote:
> On Wed, Oct 12, 2011 at 1:36 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 10/12/2011 09:55 AM, Robert Bradshaw wrote:
>>>> I'm less sure about single, since making it a function indicates one
>>>> could
>>>> use it in other contexts and the whole thing becomes too magic (since
>>>> it's
>>>> tied to the position of invocation). I'm tempted to suggest
>>>>
>>>> for _ in prange(1):
>>>>    ...
>>>>
>>>> as our syntax for single.
>>
>> Just to be clear: My point was that the above implements single behaviour
>> even now, without any extra effort.
>>
>>>
>>> The idea here is that you want a block of code executed once,
>>> presumably by the first thread that gets here? I think this could also
>>> be handled by a if statement, perhaps "if parallel.first()" or
>>> something like that. Is there anything special about this construct
>>> that couldn't simply be done by flushing/checking a variable?
>>
>> Good point. I think there's a problem with OpenMP that it has too many
>> primitives for similar things.
>>
>> I'm -1 on single -- either using a for loop or flag+flush is more to type,
>> but more readable to people who don't know cython.parallel (look: Python
>> even makes "self." explicit -- the bias in language design is clearly on
>> readability rather than writability).
>>
>> I thought of "if is_first()" as well, but my problem is again that it binds
>> to the location of the call.
>>
>> if foo:
>>    if parallel.is_first():
>>        ...
>> else:
>>    if parallel.is_first():
>>        ...
>>
>> can not be refactored to:
>>
>> if parallel.is_first():
>>    if foo:
>>        ...
>>    else:
>>        ...
>>
>> which I think is highly confusing for people who didn't write the code and
>> don't know the details of cython.parallel. (Unlike is_master(), which works
>> the same either way).
>>
>> I think we should aim for something that's as easy to read as possible for
>> Python users with no cython.parallel knowledge.
>
> Exactly. This is what's so beautiful about prange.
>
>>>>>> with parallel.barrier():
>>>>>> all threads wait until everyone has reached the barrier
>>>>>> either no one or everyone should encounter the barrier
>>>>>> shared variables are flushed
>>>>
>>>> I have problems with requiring a noop with block...
>>>>
>>>> I'd much rather write
>>>>
>>>> parallel.barrier()
>>>>
>>>> However, that ties a function call to the place of invocation, and
>>>> suggests
>>>> that one could do
>>>>
>>>> if rand()>  .5:
>>>>    barrier()
>>>> else:
>>>>    i += 3
>>>>    barrier()
>>>>
>>>> and have the same barrier in each case. Again,
>>>>
>>>> barrier(__file__, __line__)
>>>>
>>>> gets us purity at the cost of practicality. Another way is the pthreads
>>>> approach (although one may have to use pthread rather then OpenMP to get
>>>> it,
>>>> unless there are named barriers?):
>>>>
>>>> barrier_a = parallel.barrier()
>>>> barrier_b = parallel.barrier()
>>>> with parallel:
>>>>    barrier_a.wait()
>>>>    if rand()>  .5:
>>>>        barrier_b.wait()
>>>>    else:
>>>>        i += 3
>>>>        barrier_b.wait()
>>>>
>>>>
>>>> I'm really not sure here.
>>>
>>> I agree, the barrier doesn't seem like it belongs in a context. For
>>> example, it's ambiguous whether the block is supposed to proceed or
>>> succeed the barrier. I like the named barrier idea, but if that's not
>>> feasible we could perhaps use control flow to disallow conditionally
>>> calling barriers (or that every path calls the barrier (an equal
>>> number of times?)).
>>
>> It is always an option to go beyond OpenMP. Pthread barriers are a lot more
>> powerful in this way, and with pthread and Windows covered I think we should
>> be good...
>>
>> IIUC, you can't have different path calling the barrier the same number of
>> times, it's merely
>>
>> #pragma omp barrier
>>
>> and a seperate barrier statement gets another counter.
>
> Makes sense, but this greatly restricts where we could use the OpenMP version.
>
>> Which is why I think
>> it is not powerful enough and we should use pthreads.
>>
>>> +1. I like the idea of providing more parallelism constructs, but
>>> rather than risk fixating on OpenMP's model, perhaps we should look at
>>> the problem we're trying to solve (e.g., what can't one do well now)
>>> and create (or more likely borrow) the right Pythonic API to do it.
>>
>> Also, quick and flexible message-passing between threads/processes through
>> channels is becoming an increasingly popular concept. Go even has a seperate
>> syntax for channel communication, and zeromq is becoming popular for
>> distributed work.
>>
>> The is a problem Cython may need to solve here, since one currently has to
>> use very low-level C to do it quickly (either zeromq or pthreads in most
>> cases -- I guess, an OpenMP critical section would help in implementing a
>> queue though).
>>
>> I wouldn't resist a builtin "channel" type in Cython (since we don't have
>> full templating/generics, it would be the only way of sending typed data
>> conveniently?).
>
> zeromq seems to be a nice level of abstraction--we could probably get
> far with a zeromq "overlay" module that didn't require the GIL. Or is
> the C API easy enough to use if we could provide convenient mechanisms
> to initialize the tasks/threads. I think perhaps the communication
> model could be solved by a library more easily than the treading
> model.
>
>> I ultimately feel things like that is more important than 100% coverage of
>> the OpenMP standard. Of course, OpenMP is a lot lower-hanging fruit.
>
> +1 Prange handles the (corse-grained) SIMD case nicely, and a
> task/futures model based on closures would I think flesh this out to
> the next level of generality (and complexity).

Futures are definitely nice. I suppose I think really like "inline
futures", i.e. openmp tasks. I realize that futures may look more
pythonic. However, as mentioned previously, I also see issues with
that. When you submit a task then you expect a future object, which
you might want to pass around. But we don't have the GIL for that. I
personally feel that futures is something that should be done by a
library (such as concurrent.futures in python 3.2), and inline tasks
by a language. It also means I have to write an entire function or
closure for perhaps only a few lines of code.

I might also want to submit other functions that are not closures, or
I might want to reuse my closures that are used for tasks and for
something else. So what if my tasks contain more parallel constructs?
e.g. what if I have a task closure that I return from my function that
generates more tasks itself? Would you just execute them sequentially
outside of the parallel construct, or would you simply disallow that?
Also, do you restrict future "objects" to only the parallel section?

Another problem is that you can only wait on tasks of your direct
children. So what if I get access to my parent's future object
(assuming you allow tasks to generate tasks), and then want the result
of my parent?
Or what if I store these future objects in an array or list and access
them arbitrarily? You will only know at runtime which task to wait on,
and openmp only has a static, lexical taskwait.

I suppose my point is that without either a drastic rewrite (e.g., use
pthreads instead of openmp) or quite a bit of contraints, I am unsure
how futures would work here. Perhaps you guys have some concrete
syntax and semantics proposals?

> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>