[Cython] prange CEP updated

Mon Apr 18 15:57:30 CEST 2011

On 18 April 2011 13:06, mark florisson <markflorisson88 at gmail.com> wrote:
> On 16 April 2011 18:42, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
>> (Moving discussion from http://markflorisson.wordpress.com/, where Mark
>> said:)
>
> Ok, sure, it was just an issue I was wondering about at that moment,
> but it's a tricky issue, so thanks.
>
>> """
>> Started a new branch https://github.com/markflorisson88/cython/tree/openmp .
>>
>> Now the question is whether sharing attributes should be propagated
>> outwards. e.g. if you do
>>
>> for i in prange(m):
>>    for j in prange(n):
>>        sum += i * j
>>
>> then ‘sum’ is a reduction for the inner parallel loop, but not for the outer
>> one. So the user would currently have to rewrite this to
>>
>> for i in prange(m):
>>    for j in prange(n):
>>        sum += i * j
>>    sum += 0
>>
>> which seems a bit silly  . Of course, we could just disable nested
>> parallelism, or tell the users to use a prange and a ‘for from’ in such
>> cases.
>> """
>>
>> Dag: Interesting. The first one is definitely the behaviour we want, as long
>> as it doesn't cause unintended consequences.
>>
>> I don't really think it will -- the important thing is that that the order
>> of loop iteration evaluation must be unimportant. And that is still true
>> (for the outer loop, as well as for the inner) in your first example.
>>
>> Question: When you have nested pranges, what will happen is that two nested
>> OpenMP parallel blocks are used, right? And do you know if there is complete
>> freedom/"reentrancy" in that variables that are thread-private in an outer
>> parallel block and be shared in an inner one, and vice versa?
>
> An implementation may or may not support it, and if it is supported
> the behaviour can be configured through omp_set_nested(). So we should
> consider the case where it is supported and enabled.
>
> If you have a lastprivate or reduction, and after the loop these are
> (reduced and) assigned to the original variable. So if that happens
> inside a parallel construct which does not declare the variable
> private to the construct, you actually have a race. So e.g. the nested
> prange currently races in the outer parallel range.
>
>> If so I'd think that this algorithm should work and feel natural:
>>
>>  - In each prange, for the purposes of variable private/shared/reduction
>> inference, consider all internal "prange" just as if they had been "range";
>> no special treatment.
>>
>>  - Recurse to children pranges.
>
> Right, that is most natural. Algorithmically, reductions and
> lastprivates (as those can have races if placed in inner parallel
> constructs) propagate outwards towards the outermost parallel block,
> or up to the first parallel with block, or up to the first construct
> that already determined the sharing attribute.
>
> e.g.
>
> with parallel:
>     with parallel:
>        for i in prange(n):
>            for j in prange(n):
>                sum += i * j
>     # sum is well-defined here
> # sum is undefined here
>
> Here 'sum' is a reduction for the two innermost loops. 'sum' is not
> private for the inner parallel with block, as a prange in a parallel
> with block is a worksharing loop that binds to that parallel with
> block. However, the outermost parallel with block declares sum (and i
> and j) private, so after that block all those variables become
> undefined.
>
> However, in the outermost parallel with block, sum will have to be
> initialized to 0 before anything else, or be declared firstprivate,
> otherwise 'sum' is undefined to begin with. Do you think declaring it
> firstprivate would be the way to go, or should we make it private and
> issue a warning or perhaps even an error?
>
>> DS
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>

Everything seems to be working, although now the user has to be
careful with nested parallel blocks as variables can be private there
(and not firstprivate), i.e., the user has to do initialization at the
right place (e.g. in the outermost parallel block that determines it
private). I'm thinking of adding a warning, as the C compiler does.

Two issues are remaining:

1) explicit declarations of firstprivates

Do we still want those?

2) buffer auxiliary vars

When unpacking numpy buffers and using typed numpy arrays, can
reassignment or updates of a buffer-related variable ever occur in
nogil code sections? I'm thinking this is not possible and therefore
all buffer variables may be shared in parallel (for) sections?