[Cython] OpenMP support

Fri Mar 11 13:15:46 CET 2011

On 03/11/2011 12:37 PM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 11.03.2011 08:56:
>> Basically, I'm +1 to anything that can make me
>> pretend the GIL doesn't exist, even if it comes with a 2x performance 
>> hit:
>> Because that will make me write parallell code (which I can't be 
>> bothered
>> to do in Cython currently), and I have 4 cores on the laptop I use for
>> debugging, so I'd still get a 2x speedup.
>>
>> Perhaps the long-term solution is something like an "autogil" mode could
>> work where Cython automatically releases the GIL on blocks where it can
>> (such as a typed for-loop), and acquires it back when needed (an
>> exception-raising if-block within said for-loop).
>
> I assume you mean this to become a decorator or other option written 
> into the code.
>
>
>> And when doing
>> multi-threading, GIL-requiring calls are dispatched to a master 
>> GIL-holding
>> thread (which would not be a worker thread, i.e. on 4 cores you'd have 4
>> workers + 1 GIL-holding support thread). So the advice for speeding 
>> up code
>> is simply "make sure your code is all typed", just like before, but 
>> people
>> can follow that advice without even having to learn about the GIL.
>
> The GIL does not only protect the interpreter core. It also protects C 
> level data structures in user code and keeps threaded code from 
> running amok. Releasing and acquiring it doesn't come for free either, 
> so besides likely breaking code that was not specifically written to 
> be reentrant, releasing it automatically may also introduce a 
> performance penalty for many users.

The intention was that the GIL would be acquired in exceptional 
circumstances (doesn't matter for overall performance) or during 
debugging (again don't care about performance). But I agree the idea 
needs more thought on the possible pitfalls.

>
> I'm very happy the GIL exists, and I'm against anything that tries to 
> disable it automatically. Threading is an extremely dangerous 
> programming model. The GIL has its gotchas, too, but it still 
> simplifies it quite a bit. Actually, threading is so complex and easy 
> to get wrong, that any threaded code should always be written 
> specifically to support threading. Explicitly acquiring and releasing 
> the GIL is really just a minor issue on that path.

I guess the point is that OpenMP takes that "extremely dangerous 
programming model" and makes it tractable, at least for a class of 
trivial problems (not necessarily SIMD, but almost).

BTW, threading is often used simply because how how array data is laid 
out in memory. Typical usecase is every thread write to different 
non-overlapping blocks of the same array (and read from the same input 
arrays that are not changed). Then you move on to step B, which does the 
same, but perhaps blocks the arrays in a different way between threads. 
Then step C blocks the data in yet another way, etc. But at each step 
it's just "input arrays, non-overlapping blocks in output arrays", 
global parameters, local loop counters.

(One doesn't need to use threads, there was another thread on 
multiprocessing + shared memory arrays.)

Just saying that not all use of threads is "extremely dangerous", and 
OpenMP exists explicitly to dumb threading down for those cases.

Dag Sverre