[Cython] cython.parallel tasks, single, master, critical, barriers
markflorisson88 at gmail.com
Sun Oct 9 15:39:45 CEST 2011
On 9 October 2011 14:30, mark florisson <markflorisson88 at gmail.com> wrote:
> On 9 October 2011 13:57, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>> So far people have been enthusiastic about the cython.parallel features,
>>>> I think we should introduce some new features. I propose the following,
>>> I only have time for a very short feedback now, perhaps more will follow.
>>>> assume parallel has been imported from cython:
>>>> with parallel.master():
>>>> this is executed in the master thread in a parallel (non-prange)
>>>> with parallel.single():
>>>> same as master, except any thread may do the execution
>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>> barrier at the end. The default is to wait.
>> I like
>> if parallel.is_master():
>> explicit_barrier_somehow() # see below
>> better as a Pythonization. One could easily support is_master to be used in
>> other contexts as well, simply by assigning a status flag in the master
>> Using an if-test flows much better with Python I feel, but that naturally
>> lead to making the barrier explicit. But I like the barrier always being
>> explicit, rather than having it as a predicate on all the different
>> constructs like in OpenMP....
> Hmm, that might mean you also want the barrier for a prange in a
> parallel to be explicit. I like the 'if' test though, although it
> wouldn't make sense for 'single'.
>> I'm less sure about single, since making it a function indicates one could
>> use it in other contexts and the whole thing becomes too magic (since it's
>> tied to the position of invocation). I'm tempted to suggest
>> for _ in prange(1):
>> as our syntax for single.
> I think that syntax is absolutely terrible :) Perhaps single is not so
> important and one can just use master instead (or, if really needed,
> master + a task with the actual work).
>>>> with parallel.task():
>>>> create a task to be executed by some thread in the team
>>>> once a thread takes up the task it shall only be executed by that
>>>> thread and no other thread (so the task will be tied to the thread)
>>>> C variables will be firstprivate
>>>> Python objects will be shared
>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>> Closures are excellent for the notion of a task, so I think something
>>> based on the futures API would work better. I realize that makes the
>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>> it is worth it in the long run.
>>>> with parallel.critical():
>>>> this section of code is mutually exclusive with other critical sections
>>>> optional keyword argument 'name' specifies a name for the critical
>>>> which means all sections with that name will exclude each other,
>>>> but not
>>>> critical sections with different names
>>>> Note: all threads that encounter the section will execute it, just
>>>> not at the same time
>> Yes, this works well as a with-statement...
>> ..except that it is slightly magic in that it binds to call position (unlike
>> anything in Python). I.e. this would be more "correct", or at least
>> with parallel.critical(__file__, __line__):
> I'm not entirely sure what you mean here. Critical is really about the
> block contained within, not about a position in a file. Not all
> threads have to encounter the critical region, and not specifying a
> name means you exclude with *all other* unnamed critical sections (not
> just this one).
>>>> with parallel.barrier():
>>>> all threads wait until everyone has reached the barrier
>>>> either no one or everyone should encounter the barrier
>>>> shared variables are flushed
>> I have problems with requiring a noop with block...
>> I'd much rather write
> Although in OpenMP it doesn't have any associated code, but we could
> give it those semantics: apply the barrier at the end of the block of
> code. The con is that the barrier is at the top while it only affects
> leaving the block, you would write:
> with parallel.barrier():
> if rand() > .5:
> # the barrier is here
>> However, that ties a function call to the place of invocation, and suggests
>> that one could do
>> if rand() > .5:
>> i += 3
>> and have the same barrier in each case. Again,
>> barrier(__file__, __line__)
>> gets us purity at the cost of practicality.
> In this case (unlike the critical construct), yes. I think a warning
> in the docs stating that either all or none of the threads must
> encounter the barrier should suffice.
>> Another way is the pthreads
>> approach (although one may have to use pthread rather then OpenMP to get it,
>> unless there are named barriers?):
>> barrier_a = parallel.barrier()
>> barrier_b = parallel.barrier()
>> with parallel:
>> if rand() > .5:
>> i += 3
>> I'm really not sure here.
> I think we should really just say to the user: "dont do this". There
> are no named barriers, implementing this wouldn't be easy at all (in
> fact, I'm not sure you can specify sane semantics for this if you have
> more branches and some do not contain the same barrier). The block
> structure for barriers would help here, as blocks are inconvenient to
> if C:
> with barrier(): ...
> with barrier(): ...
> is just not nice to write, you would instead write
> with barrier():
> if C:
This would also allow one to write
with barrier(), master():
Basically it's up to the user to use it sensibly. Usually you want a
barrier to ensure that you have a well-defined state set by some code.
One could (correctly) only put the last line of such code in the with
block, but it would make more sense to put all associated code in
If there isn't really any associated code, you could just put 'pass'
in the block.
Does that make sense? I haven't even convinced myself of it yet.
>>>> Unfortunately, gcc again manages to horribly break master and single
>>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>>>> first file a bug report. Other (better) compilers like Portland (and I'm
>>>> sure Intel) work fine. I suppose a warning in the documentation will
>>>> suffice there.
>>>> If we at some point implement vector/SIMD operations we could also try
>>>> out the Fortran openmp workshare construct.
>>> I'm starting to learn myself OpenCL as part of a course. It's very neat
>>> for some kinds of parallelism. What I'm saying is that at least of the
>>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
>>> too early, but also look forward to coming architectures (e.g., AMD's
>>> GPU-and-CPU on same die design).
>>> Dag Sverre
>>> cython-devel mailing list
>>> cython-devel at python.org
>> cython-devel mailing list
>> cython-devel at python.org
Of course, a 'with barrier():' means you can apply it anywhere:
lots of code
single line of code
But the trick for readable programs would be to find the section of code that is
More information about the cython-devel