[Cython] prange CEP updated
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Thu Apr 14 20:55:41 CEST 2011
On 04/14/2011 08:39 PM, mark florisson wrote:
> On 14 April 2011 20:29, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no> wrote:
>> On 04/13/2011 11:13 PM, mark florisson wrote:
>>> Although there is omp_get_max_threads():
>>> "The omp_get_max_threads routine returns an upper bound on the number
>>> of threads that could be used to form a new team if a parallel region
>>> without a num_threads clause were encountered after execution returns
>>> from this routine."
>>> So we could have threadsvailable() evaluate to that if encountered
>>> outside a parallel region. Inside, it would evaluate to
>>> omp_get_num_threads(). At worst, people would over-allocate a bit.
>> Well, over-allocating could well mean 1 GB, which could well mean getting an
>> unecesarry MemoryError (or, like in my case, if I'm not careful to set
>> ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
>> cluster patrol process...)
> The upper bound is not "however many threads you think you can start",
> but rather "how many threads are considered useful for your machine".
> So if you use omp_set_num_threads(), it will return the value you set
> there. Otherwise, if you have e.g. a quadcore, it will return 4. The
> spec says:
> "Note – The return value of the omp_get_max_threads routine can be
> used to dynamically allocate sufficient storage for all threads in the
> team formed at the subsequent active parallel region."
> So this sounds like a viable option.
What would happen here: We have 8 cores. Some code has an OpenMP
parallel section with maxthreads=2, and inside the section another
function is called.
That called function uses threadsavailable(), and has a parallel block
that wants as many threads as it can get.
I don't know the details as well as you do, but my uninformed guess is
that in this case it'd be quite possible with a race where
omp_get_max_threads would return 7 in each case, then the first one to
the parallel would get the 7 threads. The remaining thread then has
allocated storage for 7 threads but only has 1 thread running.
BTW, I'm not sure what the difference is between the original idea and
omp_get_max_threads -- in the absence of such races as above, my
original idea with entering a parallel section (with the same scheduling
parameters) just to see how many threads we got, would work as well?
More information about the cython-devel