[Python-Dev] Making the GIL faster & lighter on Windows
phillip.sitbon+python-dev at gmail.com
Tue May 26 21:48:49 CEST 2009
I'm new to the list but I've been embedding Python and working very
closely with the core sources for many years now. I discovered Python
a long time ago when I needed to embed a scripting language and found
the PHP sources... unreadable ;)
Anyway, I'd like to ask something that may have been asked already, so
I apologize if this has been covered.
Instead of removing the GIL, has anyone thought of making it more
lightweight? The current situation for Windows is that the
single-thread case is decently fast (via interlocked operations), but
it drops to using an event object in the case of contention. (see
Now, I don't have any specific evidence aside from my experience in
Windows multithreaded programming, but event objects are often
considered the slowest synchronization mechanism available. So, what
are the alternatives? Mutexes or critical sections. Semaphores too, if
you want to get fancy, but I digress.
Because mutexes have the capability of inter-process locking, which we
don't need, critical sections fit the bill as a lightweight locking
mechanism. They work in a way similar to how the Python GIL is
handled: first, attempt an interlocked operation, and if another
thread owns the lock, wait on a kernel object. They are known to be
There are some catches with using a critical section instead of the
1. It is recursive, while the current GIL setup is not. Would it break
Python to support (or deal with) recursive behavior at the GIL level?
Note that we can still disallow recursion and fail because we know if
the current thread is the lock owner, but the return from the lock
function is usually only checked when the wait parameter is zero
(meaning "don't block, just try to acquire"). The biggest problem I
see here is how mixing the PyGILState_* API with multiple interpreters
will behave: when PyGILState_Ensure() is called while the GIL is held
for a thread state under an interpreter other than the main
interpreter, it tries to re-lock the GIL. This would normally cause a
deadlock, but the best we could do with a critical section is have the
call fail and/or increase a recursion counter. If maintaining behavior
is absolutely necessary, I guess it would be pretty easy to force a
deadlock. Personally, I would prefer a Py_FatalError or something like
2. Backwards incompatibility: TryEnterCriticalSection isn't available
pre-NT4, so Windows 95 support is broken. Microsoft doesn't support or
even mention it in the list of supporting OSes for their API functions
anymore, so... non-issue? Some of the data structure is available to
us, so I bet it would be easy to implement the function manually.
3. ?? - I'm sure there are other issues that deserve a look.
I've given this a shot already while doing some concurrency testing
with my ISAPI extension (PyISAPIe). First of all, nothing looks broken
yet. I'm using my modified python26.dll to run all of my Python code
and trying to find anywhere it could possibly break. For multiple
concurrent requests against a single multithreaded ISAPI handler
process, I see a statistically significant speed increase depending on
how much Python code is executed. With more Python code executed (e.g.
a Django page), the speedup was about 2x. I haven't tested with varied
values for _Py_CheckInterval aside from finding a sweet spot for my
specific purposes, but using 100 (the default) would likely make the
performance difference more noticeable. A spin mutex also does well,
but the results vary a lot more.
Just as a disclaimer, my tests were nowhere near scientific, but if
anyone needs convincing I can come up with some actual measurements. I
think at this point most of you are wondering more about what it would
Hopefully I haven't wasted anyone's time - I just wanted to share what
I see as a possibly substantial improvement to Python's core. let me
know if you're interested in a patch to use for your own testing.
More information about the Python-Dev