Hey,<div><br></div><div>So far people have been enthusiastic about the cython.parallel features, I think we should introduce some new features. I propose the following, assume parallel has been imported from cython:</div><div>

<br></div><div>with parallel.master():</div><div>    this is executed in the master thread in a parallel (non-prange) section</div><div><br></div><div>with parallel.single():</div><div>   same as master, except any thread may do the execution</div>

<div><br></div><div>An optional keyword argument &#39;nowait&#39; specifies whether there will be a barrier at the end. The default is to wait.</div><div><br></div><div>with parallel.task():</div><div>    create a task to be executed by some thread in the team</div>

<div>    once a thread takes up the task it shall only be executed by that thread and no other thread (so the task will be tied to the thread)</div><div><br></div><div>    C variables will be firstprivate</div><div>    Python objects will be shared</div>

<div><br></div><div>parallel.taskwait() # wait on any direct descendent tasks to finish</div><div><br></div><div>with parallel.critical():</div><div>    this section of code is mutually exclusive with other critical sections</div>

<div>    </div><div>    optional keyword argument &#39;name&#39; specifies a name for the critical section, </div><div>    which means all sections with that name will exclude each other, but not</div><div>    critical sections with different names</div>

<div><br></div><div>    Note: all threads that encounter the section will execute it, just not at the same time</div><div><br></div><div>with parallel.barrier():</div><div>    all threads wait until everyone has reached the barrier</div>

<div>    either no one or everyone should encounter the barrier</div><div>    shared variables are flushed</div><div><br></div><div>Unfortunately, gcc again manages to horribly break master and single constructs in loops (versions 4.2 throughout 4.6), so I suppose I&#39;ll first file a bug report. Other (better) compilers like Portland (and I&#39;m sure Intel) work fine. I suppose a warning in the documentation will suffice there.</div>

<div><br></div><div>If we at some point implement vector/SIMD operations we could also try out the Fortran openmp workshare construct.</div><div><br></div><div>What do you guys think?</div><div><br></div><div>Mark</div>