[Cython] prange CEP updated

Robert Bradshaw robertwb at math.washington.edu
Tue Apr 26 19:59:45 CEST 2011

On Tue, Apr 26, 2011 at 7:25 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 21 April 2011 20:13, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
>> On 04/21/2011 10:37 AM, Robert Bradshaw wrote:
>>> On Mon, Apr 18, 2011 at 7:51 AM, mark florisson
>>> <markflorisson88 at gmail.com>  wrote:
>>>> On 18 April 2011 16:41, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>>>>  wrote:
>>>>> Excellent! Sounds great! (as I won't have my laptop for some days I
>>>>> can't
>>>>> have a look yet but I will later)
>>>>> You're right about (the current) buffers and the gil. A testcase
>>>>> explicitly
>>>>> for them would be good.
>>>>> Firstprivate etc: i think it'd be nice myself, but it is probably better
>>>>> to
>>>>> take a break from it at this point so that we can think more about that
>>>>> and
>>>>> not do anything rash; perhaps open up a specific thread on them and ask
>>>>> for
>>>>> more general input. Perhaps you want to take a break or task-switch to
>>>>> something else (fused types?) until I can get around to review and merge
>>>>> what you have so far? You'll know best what works for you though. If you
>>>>> decide to implement explicit threadprivate variables because you've got
>>>>> the
>>>>> flow I certainly wom't object myself.
>>>>  Ok, cool, I'll move on :) I already included a test with a prange and
>>>> a numpy buffer with indexing.
>>> Wow, you're just plowing away at this. Very cool.
>>> +1 to disallowing nested prange, that seems to get really messy with
>>> little benefit.
>>> In terms of the CEP, I'm still unconvinced that firstprivate is not
>>> safe to infer, but lets leave the initial values undefined rather than
>>> specifying them to be NaNs (we can do that as an implementation if you
>>> want), which will give us flexibility to change later once we've had a
>>> chance to play around with it.
>> I don't see any technical issues with inferring firstprivate, the question
>> is whether we want to. I suggest not inferring it in order to make this
>> safer: One should be able to just try to change a loop from "range" to
>> "prange", and either a) have things fail very hard, or b) just work
>> correctly and be able to trust the results.
>> Note that when I suggest using NaN, it is as initial values for EACH
>> ITERATION, not per-thread initialization. It is not about "firstprivate" or
>> not, but about disabling thread-private variables entirely in favor of
>> "per-iteration" variables.
>> I believe that by talking about "readonly" and "per-iteration" variables,
>> rather than "thread-shared" and "thread-private" variables, this can be used
>> much more safely and with virtually no knowledge of the details of
>> threading. Again, what's in my mind are scientific programmers with (too)
>> little training.
>> In the end it's a matter of taste and what is most convenient to more users.
>> But I believe the case of needing real thread-private variables that
>> preserves per-thread values across iterations (and thus also can possibly
>> benefit from firstprivate) is seldomly enough used that an explicit
>> declaration is OK, in particular when it buys us so much in safety in the
>> common case.
>> To be very precise,
>> cdef double x, z
>> for i in prange(n):
>>    x = f(x)
>>    z = f(i)
>>    ...
>> goes to
>> cdef double x, z
>> for i in prange(n):
>>    x = z = nan
>>    x = f(x)
>>    z = f(i)
>>    ...
>> and we leave it to the C compiler to (trivially) optimize away "z = nan".
>> And, yes, it is a stopgap solution until we've got control flow analysis so
>> that we can outright disallow such uses of x (without threadprivate
>> declaration, which also gives firstprivate behaviour).
> Ah, I see, sure, that sounds sensible. I'm currently working on fused
> types, so when I finish that up I'll return to that.
>>> The "cdef threadlocal(int) foo" declaration syntax feels odd to me...
>>> We also probably want some way of explicitly marking a variable as
>>> shared and still be able to assign to/flush/sync it. Perhaps the
>>> parallel context could be used for these declarations, i.e.
>>>     with parallel(threadlocal=a, shared=(b,c)):
>>>         ...
>>> which would be considered an "expert" usecase.
>> I'm not set on the syntax for threadlocal variables; although your proposal
>> feels funny/very unpythonic, almost like a C macro. For some inspiration,
>> here's the Python solution (with no obvious place to put the type):
>> import threading
>> mydata = threading.local()
>> mydata.myvar = ... # value is threadprivate
>>> For all the discussion of threadsavailable/threadid, the most common
>>> usecase I see is for allocating a large shared buffer and partitioning
>>> it. This seems better handled by allocating separate thread-local
>>> buffers, no? I still like the context idea, but everything in a
>>> parallel block before and after the loop(s) also seems like a natural
>>> place to put any setup/teardown code (though the context has the
>>> advantage that __exit__ is always called, even if exceptions are
>>> raised, which makes cleanup a lot easier to handle).
>> I'd *really* like to have try/finally available in cython.parallel block for
>> this, although I realize that may have to wait for a while. A big part of
>> our discussions at the workshop were about how to handle exceptions; I guess
>> there'll be a "phase 2" of this where break/continue/raise is dealt with.
> I'll leave that until I finish fused types and the typed memory views.
> Before I'd start on that I'd first review the with gil block and
> ensure the tests pass in all python versions, and perhaps that should
> be merged first before I pull it into the parallel branch? Otherwise
> you're kind of forced to review both branches.

Yes, that makes sense.

- Robert

More information about the cython-devel mailing list