[Cython] CEP: prange for parallel loops

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Mon Apr 4 13:53:16 CEST 2011


On 04/04/2011 01:23 PM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 04.04.2011 12:17:
>> CEP up at http://wiki.cython.org/enhancements/prange
>
> """
> Variable handling
>
> Rather than explicit declaration of shared/private variables we rely 
> on conventions:
>
>     * Thread-shared: Variables that are only read and not written in 
> the loop body are shared across threads. Variables that are only used 
> in the else block are considered shared as well.
>
>     * Thread-private: Variables that are assigned to in the loop body 
> are thread-private. Obviously, the iteration counter is thread-private 
> as well.
>
>     * Reduction: Variables that only used on the LHS of an inplace 
> operator, such as s above, are marked as targets for reduction. If the 
> variable is also used in other ways (LHS of assignment or in an 
> expression) it does instead turn into a thread-private variable. Note: 
> This means that if one, e.g., inserts printf(... s) above, s is turned 
> into a thread-local variable. OTOH, there is simply no way to 
> correctly emulate the effect printf(... s) would have in a sequential 
> loop, so such code must be discouraged anyway.
> """
>
> What about simply (ab-)using Python semantics and creating a new inner 
> scope for the prange loop body? That would basically make the loop 
> behave like a closure function, but with the looping header at the 
> 'right' place rather than after the closure.

I'm not quite sure what the concrete changes to the CEP this would lead 
to (assuming you mean this as a proposal for alternative semantics, and 
not an implementation detail).

How would we treat reduction variables? They need to be supported, and 
there's nothing in Python semantics to support reduction variables, they 
are a rather special case everywhere. I suppose keeping the reduction 
clause above, or use the "nonlocal" keyword in the loop body...

Also there's the else:-block, although we could make that part of the 
scope. And the "lastprivate" functionality, although that could be 
dropped without much loss.

>
> Also, in the example, the local variable declaration of "tmp" outside 
> of the loop looks somewhat misplaced, although it's precedented by 
> comprehensions (which also have their own local scope in Cython).

Well, depending on the decision of lastprivate, the declaration would 
need to be outside; I really like the idea of moving "cdef", and am 
prepared to drop lastprivate for this.

Being explicit about thread-local variables does make things a lot safer 
to use.

(One problem is that switching between serial and parallel one needs to 
move variable declarations. But that only happens once, and one can use 
"nthreads=1" to disable parallel after that.)

An example would then be:

def f(np.ndarray[double] x, double alpha):
     cdef double s = 0, globtmp
     with nogil:
         for i in prange(x.shape[0]):
             cdef double tmp # thread-private
             tmp = alpha * i # alpha available from global scope
             s += x[i] * tmp # still automatic reduction for inplace 
operators
             # printf(...s) -> now leads to error, since s is not 
declared thread-private but is read
         else:
             # tmp still available here...looks a bit strange, but useful
             s += tmp * 10
             globtmp = tmp # we save tmp for later
         # tmp not available here, globtmp is
     return s

Or, we just drop support for the else block on these loops.

Dag Sverre


More information about the cython-devel mailing list