[concurrency] Inside the Python GIL

Jesse Noller jnoller at gmail.com
Fri Jun 12 18:18:12 CEST 2009


On Fri, Jun 12, 2009 at 12:08 PM, Jeremy Hylton<jeremy at alum.mit.edu> wrote:
> On Fri, Jun 12, 2009 at 11:45 AM, Jesse Noller<jnoller at gmail.com> wrote:
>> Really? Is this the worse thing ever? How many of us building heavily
>> threaded I/O bound applications are truly hampered by this? Yes; this
>> sucks for CPU bound applications, that's been known since the earth
>> cooled.
>
> I'm not sure I understand how to distinguish between I/O bound threads
> and CPU bound threads.  If you've got a relatively simple
> multi-threaded application like an HTTP fetcher with a thread pool
> fetching a lot of urls, you're probably going to end up having more
> than one thread  with input to process at any instant.  There's a ton
> of Python code that executes when that happens.  You've got a urllib
> addinfourl wrapper, a httplib HTTPResponse (with read & _safe_read)
> and a socket _fileobject.  Heaven help you if you are using readline.
> So I could image even this trivial I/O bound program having lots of
> CPU contention.
>
> Jeremy
>

Speaking as someone who does have lots of apps doing heavily threaded
URL fetching (puts, gets, deletes) - the GIL ends up not bothering me,
and does speed things up (but not as much as I'd like). I tend to push
heavier data parsing off via multiprocessing, and stick to just
threads for the GET/PUT/POSTS.

I had one benchmark in PEP 371 which did url fetching
(http://www.python.org/dev/peps/pep-0371/):

        cmd: python run_benchmarks.py url_get.py
        Importing url_get
        Starting tests ...
        non_threaded (1 iters)  0.124774 seconds
        threaded (1 threads)    0.120478 seconds
        processes (1 procs)     0.121404 seconds

        non_threaded (2 iters)  0.239574 seconds
        threaded (2 threads)    0.146138 seconds
        processes (2 procs)     0.138366 seconds

        non_threaded (4 iters)  0.479159 seconds
        threaded (4 threads)    0.200985 seconds
        processes (4 procs)     0.188847 seconds

        non_threaded (8 iters)  0.960621 seconds
        threaded (8 threads)    0.659298 seconds
        processes (8 procs)     0.298625 seconds

For heavy http handling though, I rapidly move to using pycurl, rather
than httplib, which of course brings a C module into play and allows
me to sidestep some of the issues even more.

Note that I'm not advocating/saying "things are fine as is" - I'm a
pretty squeaky wheel when it comes to making this space (threads, the
GIL, etc) better. Right now, my biggest thing to watch is
unladen-swallow in this regard, as I don't see a lot of movement for
this in core today.

However, that being said; I think people get hung up on the GIL before
even knowing if it does affect their application, and are too quick to
discount python threads as a whole before figuring it out for
themselves.

jesse


More information about the concurrency-sig mailing list