[concurrency] Inside the Python GIL

Fri Jun 12 17:52:37 CEST 2009

The slides make perfect sense. As he says, the open question is what
to do about it. If someone can write a relatively simple patch to
improve the behavior, with a test to make sure it stays improved, I
think it would have a very good chance of getting accepted into
CPython. A complex patch would have less chance because of Jesse's
answer. :) Unladen Swallow
(http://code.google.com/p/unladen-swallow/source/browse/tests/perf.py)
would accept a benchmark just measuring the problem even without a
suggestion to improve it.

Here's a discussion that may illustrate why fixing this is tough:

 * On a multicore machine, a waiting thread has to do some amount of
work to wake up. A reasonable ballpark is ~1us. It makes sense to let
the foreground thread continue making progress while the background
thread is waking up, especially since the OS may not choose to wake up
a Python thread first. So we let the foreground thread re-acquire the
GIL immediately after releasing it, in the hope that it can get a
couple more checks in before the background thread actually wakes up.
BUT, we don't really want to let it continue running after the waiting
thread does wake up, so perhaps we should have the waiting thread set
a flag when it does wake up which forces the foreground thread to
sleep asap. Then the waiting thread has to wait for the GIL again, but
we DON'T want it to hand control back to the OS or we would have
wasted that waking-up time. So maybe we have it spin-wait. But what
happens if the OS has actually swapped out the foreground thread for
another process? Then we waste lots of time. I don't know of any OSes
that give us a way to do something when a thread gets swapped out.
They don't even let another thread check whether a given thread is
currently running.

 * On a single core, any time the foreground thread spends executing
after signaling a waiting thread is time the waiting thread can't use
to wake up. So it makes sense to force a context switch to a
particular waiting thread. This is actually pretty easy: instead of a
GIL, we have a binary semaphore per thread that gets upped to instruct
a particular thread to run, and then the previously-running thread
immediately waits on its own semaphore. The issue here is just the
time it takes to switch threads: ~1us. The GIL checks are currently
every 100 ticks (every couple opcodes), which means that in
arithmetic-heavy code those checks occur on the order of every
microsecond too. You don't want to spend half of your time switching
threads. On the other hand, as Dave pointed out, sometimes even 100
ticks isn't soon enough. I think we could solve this by checking the
elapsed time on each "check" rather than unconditionally switching
threads, but we might want to do something to give I/O-bound threads
higher priority.

Anyway, I'm not likely to work on this any time soon, but I'm happy to
review any patches someone else produces. :)

On Fri, Jun 12, 2009 at 8:16 AM, Pete<pfein at pobox.com> wrote:
> I didn't attend last night's UG, but I saw Dave give a version of this talk
> about a month ago.  I'll second Carl's opinion - this talk is of critical
> importance to anyone using threads in Python.
>
> Begin forwarded message:
>
>> From: Carl Karsten <carl at personnelware.com>
>> Date: June 12, 2009 10:51:33 AM EDT
>> To: The Chicago Python Users Group <chicago at python.org>
>> Subject: Re: [Chicago] Posted : Video
>>
>> * David Beazley: mind-blowing presentation about how the Python GIL
>> actually works and why it's even worse than most people even imagine.
>> http://blip.tv/file/2232410   http://www.dabeaz.com/python/GIL.pdf
>