[Python-Dev] Removing the GIL (Me, not you!)

Thu Sep 13 18:29:15 CEST 2007

On Sep 13, 2007, at 9:25 PM, Martin v. Löwis wrote:

>>> "Since we are guaranteeing that synchronized code is running on a  
>>> single
>>> core, it is the equivalent of a lock at the cost of a context  
>>> switch."
>>>
>>> This is precisely what a lock costs today: a context switch.
>>>
>>
>> Really? Wouldn't we save some memory allocation overhead (since in my
>> design, the "lock" is a really just simple kernel instruction as  
>> opposed
>> to a full blown object)
>
> The GIL is a single variable, not larger than 50 Bytes or so. Locking
> it requires no memory at all in user-space, and might require 8 bytes
> or so per waiting thread in kernel-space.
>
>> thereby lowering lock overhead
>
> Why do you think "lock overhead" is related to memory consumption?

Well, it can be one (or both) of two things - 1) memory consumption,  
2) cost of acquiring and releasing the locks (which you said is the  
same as a context switch).

Since we've also identified (according to GvR's post: http:// 
www.artima.com/weblogs/viewpost.jsp?thread=214235) that the slowdown  
was 2x in a single threaded application (which couldn't be due to  
lock contention), it must be due to lock overhead (unless the  
programming was otherwise faulty or there is something else about  
locks that I don't know about - Martin?). Hence I'm assuming that we  
need to reduce lock overhead. If acquiring and releasing locks (part  
of lock overhead) is a simple context switch (and I don't doubt you  
here), then the only remaining thing to optimize is memory operations  
related to lock objects.

>
>> Since we're using an asynch message queue for the synch-server, it
>> sounds like a standard lock-free algorithm.
>
> You lost me here. What are you trying to achieve? It's not the lock
> that people complain about, but that Python runs serially most
> of the time.

http://en.wikipedia.org/wiki/Lock-free_and_wait- 
free_algorithms#The_lock-free_approach

Specifically, i'm trying to achieve the approach using a "deposit  
request".

>> I think I neglected to mention that the locking would still need  
>> to be
>> more fine grained - perhaps only do the context switch around  
>> refcounts
>> (and the other places where the GIL is critical).
>
> I think this is the point where I need to say "good luck implementing
> it".

I don't mean to be unhelpful. Its just that this discussion started  
because people (not me - although I would definitely benefit) showed  
interest in removing the GIL.

>> Well, my interpretation of the current problem is that removing  
>> the GIL
>> has not been productive because of problems with lock contention on
>> multi-core machines.
>
> My guess is that this interpretation is wrong. It was reported that
> there was a slowdown by a factor of 2 in a single-threaded  
> application.
> That can't be due to lock contention.

I agree with your point Martin (see my analysis above). Regarding  
lock contention: I'm guessing that if single threaded applications  
are so badly affected, then the cumulative overhead on multithreaded  
applications will be even worse. So we need to reduce the overhead.  
But then since all Python code runs under the GIL - which is a pretty  
coarse lock, we have to make the new locking more fine-grained (which  
is what I think the original patch by Greg Stein did). I'm also  
guessing that if you do that then for each refcount you're going to  
have to acquire a lock... which happens *very* frequently (and I  
think by your earlier responses you concur). So that means anytime  
multiple threads try to access the same object, they will need to do  
an incref/decref. e.g. If you access a global variable inside a for- 
loop from multiple threads.

>> If we can somehow guarantee all GC operations (which is why the  
>> GIL is
>> needed in the first place)
>
> No, unless we disagree on what a "GC operation" is.

Ok. Other people know more about the specifics of the GIL than I do.  
However, the main issue with removing the GIL seems to be the  
reference counting algorithm. That is what I was alluding to. In any  
case, it is not relevant for the rest of the discussion.

regards,
Prateek