[Cython] prange CEP updated

Thu Apr 14 20:55:41 CEST 2011

On 04/14/2011 08:39 PM, mark florisson wrote:
> On 14 April 2011 20:29, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>  wrote:
>> On 04/13/2011 11:13 PM, mark florisson wrote:
>>>
>>> Although there is omp_get_max_threads():
>>>
>>> "The omp_get_max_threads routine returns an upper bound on the number
>>> of threads that could be used to form a new team if a parallel region
>>> without a num_threads clause were encountered after execution returns
>>> from this routine."
>>>
>>> So we could have threadsvailable() evaluate to that if encountered
>>> outside a parallel region. Inside, it would evaluate to
>>> omp_get_num_threads(). At worst, people would over-allocate a bit.
>>
>> Well, over-allocating could well mean 1 GB, which could well mean getting an
>> unecesarry MemoryError (or, like in my case, if I'm not careful to set
>> ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
>> cluster patrol process...)
>
> The upper bound is not "however many threads you think you can start",
> but rather "how many threads are considered useful for your machine".
> So if you use omp_set_num_threads(), it will return the value you set
> there. Otherwise, if you have e.g. a quadcore, it will return 4. The
> spec says:
>
> "Note – The return value of the omp_get_max_threads routine can be
> used to dynamically allocate sufficient storage for all threads in the
> team formed at the subsequent active parallel region."
>
> So this sounds like a viable option.

What would happen here: We have 8 cores. Some code has an OpenMP 
parallel section with maxthreads=2, and inside the section another 
function is called.

That called function uses threadsavailable(), and has a parallel block 
that wants as many threads as it can get.

I don't know the details as well as you do, but my uninformed guess is 
that in this case it'd be quite possible with a race where 
omp_get_max_threads would return 7 in each case, then the first one to 
the parallel would get the 7 threads. The remaining thread then has 
allocated storage for 7 threads but only has 1 thread running.

BTW, I'm not sure what the difference is between the original idea and 
omp_get_max_threads -- in the absence of such races as above, my 
original idea with entering a parallel section (with the same scheduling 
parameters) just to see how many threads we got, would work as well?

DS