[concurrency] Inside the Python GIL
Jesse Noller
jnoller at gmail.com
Fri Jun 12 18:18:12 CEST 2009
On Fri, Jun 12, 2009 at 12:08 PM, Jeremy Hylton<jeremy at alum.mit.edu> wrote:
> On Fri, Jun 12, 2009 at 11:45 AM, Jesse Noller<jnoller at gmail.com> wrote:
>> Really? Is this the worse thing ever? How many of us building heavily
>> threaded I/O bound applications are truly hampered by this? Yes; this
>> sucks for CPU bound applications, that's been known since the earth
>> cooled.
>
> I'm not sure I understand how to distinguish between I/O bound threads
> and CPU bound threads. If you've got a relatively simple
> multi-threaded application like an HTTP fetcher with a thread pool
> fetching a lot of urls, you're probably going to end up having more
> than one thread with input to process at any instant. There's a ton
> of Python code that executes when that happens. You've got a urllib
> addinfourl wrapper, a httplib HTTPResponse (with read & _safe_read)
> and a socket _fileobject. Heaven help you if you are using readline.
> So I could image even this trivial I/O bound program having lots of
> CPU contention.
>
> Jeremy
>
Speaking as someone who does have lots of apps doing heavily threaded
URL fetching (puts, gets, deletes) - the GIL ends up not bothering me,
and does speed things up (but not as much as I'd like). I tend to push
heavier data parsing off via multiprocessing, and stick to just
threads for the GET/PUT/POSTS.
I had one benchmark in PEP 371 which did url fetching
(http://www.python.org/dev/peps/pep-0371/):
cmd: python run_benchmarks.py url_get.py
Importing url_get
Starting tests ...
non_threaded (1 iters) 0.124774 seconds
threaded (1 threads) 0.120478 seconds
processes (1 procs) 0.121404 seconds
non_threaded (2 iters) 0.239574 seconds
threaded (2 threads) 0.146138 seconds
processes (2 procs) 0.138366 seconds
non_threaded (4 iters) 0.479159 seconds
threaded (4 threads) 0.200985 seconds
processes (4 procs) 0.188847 seconds
non_threaded (8 iters) 0.960621 seconds
threaded (8 threads) 0.659298 seconds
processes (8 procs) 0.298625 seconds
For heavy http handling though, I rapidly move to using pycurl, rather
than httplib, which of course brings a C module into play and allows
me to sidestep some of the issues even more.
Note that I'm not advocating/saying "things are fine as is" - I'm a
pretty squeaky wheel when it comes to making this space (threads, the
GIL, etc) better. Right now, my biggest thing to watch is
unladen-swallow in this regard, as I don't see a lot of movement for
this in core today.
However, that being said; I think people get hung up on the GIL before
even knowing if it does affect their application, and are too quick to
discount python threads as a whole before figuring it out for
themselves.
jesse
More information about the concurrency-sig
mailing list