[Cython] prange CEP updated

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Mon Apr 18 16:41:47 CEST 2011


Excellent! Sounds great! (as I won't have my laptop for some days I can't have a look yet but I will later)

You're right about (the current) buffers and the gil. A testcase explicitly for them would be good.

Firstprivate etc: i think it'd be nice myself, but it is probably better to take a break from it at this point so that we can think more about that and not do anything rash; perhaps open up a specific thread on them and ask for more general input. Perhaps you want to take a break or task-switch to something else (fused types?) until I can get around to review and merge what you have so far? You'll know best what works for you though. If you decide to implement explicit threadprivate variables because you've got the flow I certainly wom't object myself.


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

mark florisson <markflorisson88 at gmail.com> wrote:

On 18 April 2011 13:06, mark florisson <markflorisson88 at gmail.com> wrote: > On 16 April 2011 18:42, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote: >> (Moving discussion from http://markflorisson.wordpress.com/, where Mark >> said:) > > Ok, sure, it was just an issue I was wondering about at that moment, > but it's a tricky issue, so thanks. > >> """ >> Started a new branch https://github.com/markflorisson88/cython/tree/openmp . >> >> Now the question is whether sharing attributes should be propagated >> outwards. e.g. if you do >> >> for i in prange(m): >>    for j in prange(n): >>        sum += i * j >> >> then ‘sum’ is a reduction for the inner parallel loop, but not for the outer >> one. So the user would currently have to rewrite this to >> >> for i in prange(m): >>    for j in prange(n): >>        sum += i * j >>    sum += 0 >> >> which seems a bit silly  . Of course, we could just disable nested >> parallelism, or tell the users to use a prange and a ‘for
from’ in such >> cases. >> """ >> >> Dag: Interesting. The first one is definitely the behaviour we want, as long >> as it doesn't cause unintended consequences. >> >> I don't really think it will -- the important thing is that that the order >> of loop iteration evaluation must be unimportant. And that is still true >> (for the outer loop, as well as for the inner) in your first example. >> >> Question: When you have nested pranges, what will happen is that two nested >> OpenMP parallel blocks are used, right? And do you know if there is complete >> freedom/"reentrancy" in that variables that are thread-private in an outer >> parallel block and be shared in an inner one, and vice versa? > > An implementation may or may not support it, and if it is supported > the behaviour can be configured through omp_set_nested(). So we should > consider the case where it is supported and enabled. > > If you have a lastprivate or reduction, and after the loop these are > (reduced and) assigned
to the original variable. So if that happens > inside a parallel construct which does not declare the variable > private to the construct, you actually have a race. So e.g. the nested > prange currently races in the outer parallel range. > >> If so I'd think that this algorithm should work and feel natural: >> >>  - In each prange, for the purposes of variable private/shared/reduction >> inference, consider all internal "prange" just as if they had been "range"; >> no special treatment. >> >>  - Recurse to children pranges. > > Right, that is most natural. Algorithmically, reductions and > lastprivates (as those can have races if placed in inner parallel > constructs) propagate outwards towards the outermost parallel block, > or up to the first parallel with block, or up to the first construct > that already determined the sharing attribute. > > e.g. > > with parallel: >     with parallel: >        for i in prange(n): >            for j in prange(n): >                sum += i * j > 
   # sum is well-defined here > # sum is undefined here > > Here 'sum' is a reduction for the two innermost loops. 'sum' is not > private for the inner parallel with block, as a prange in a parallel > with block is a worksharing loop that binds to that parallel with > block. However, the outermost parallel with block declares sum (and i > and j) private, so after that block all those variables become > undefined. > > However, in the outermost parallel with block, sum will have to be > initialized to 0 before anything else, or be declared firstprivate, > otherwise 'sum' is undefined to begin with. Do you think declaring it > firstprivate would be the way to go, or should we make it private and > issue a warning or perhaps even an error? > >> DS >>_____________________________________________
>> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> > Everything seems to be working, although now the user has to be careful with nested parallel blocks as variables can be private there (and not firstprivate), i.e., the user has to do initialization at the right place (e.g. in the outermost parallel block that determines it private). I'm thinking of adding a warning, as the C compiler does. Two issues are remaining: 1) explicit declarations of firstprivates Do we still want those? 2) buffer auxiliary vars When unpacking numpy buffers and using typed numpy arrays, can reassignment or updates of a buffer-related variable ever occur in nogil code sections? I'm thinking this is not possible and therefore all buffer variables may be shared in parallel (for) sections?_____________________________________________
cython-devel mailing list cython-devel at python.org http://mail.python.org/mailman/listinfo/cython-devel 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20110418/1c2eb88d/attachment.html>


More information about the cython-devel mailing list