[pypy-dev] Thinking about the GIL

Laura Creighton lac at openend.se
Mon Mar 14 19:13:34 CET 2011


Robert Hancock hosted a bof at pycon about concurrency and multiprocessing.
I went there looking to find out how other people were doing things,
especially looking for information about how other languages handled
things.  It would be nice to kill the GIL, if only we knew of a
brilliant way to do this.

Unfortunately, I was one year to late for this discussion.  This is
what Robert Hancock David Beazley, Peter Portante and others discussed at Pycon
_last year_.  So I asked Robert Hancock for the notes he took then.
(I continue after this  forwarded message)

------- Forwarded Message

Return-Path: hancock.robert at gmail.com
Delivery-Date: Mon Mar 14 17:09:51 2011
Return-Path: <hancock.robert at gmail.com>
Subject: Re: please send me the notes you took last year
To: Laura Creighton <lac at openend.se>

These are the books that I mentioned:
Machine Learning: An Algorithmic Perspective
http://www.amazon.com/gp/product/1420067184
 <http://www.amazon.com/gp/product/1420067184>
I found this more approachable than the Bishop  and a number of 
examples are in Python.

Introduction to Data Mining
http://www.amazon.com/gp/product/0321321367
<http://www.amazon.com/gp/product/0321321367>
I've only started this, but it is nice with David Mease's Google Tech Talk 
series. http://www.youtube.com/watch?v=zRsMEl6PHhM

1.  Make all IO non-blocking and mediate the processes like greenlets.  This
does not allow you take advantage of the OS level thread scheduler which is
far more sophisticated than greenlets.  See the Linux kernel specifications
for the details of the multi-level feedback queue.

2. Construct a multi-level feedback queue within Python.  This is
extraordinarily complicated and complex to implement.  Why duplicate what
already exists?

3. Do we need to maintain compatibility with being able to call out to C
functions?  The primary complaint about the GIL is that it does not
efficiently handle CPU bound processes and multi-cores.  Running sequential 
processes in threads on multi-cores can actually slow down the processes.

4. Who has already solved this problem as part of the language?
    -   Erlang (No one knew the nitty gritty details.)

    -   Go - based on Tony Hoare's CSP and the work done on Plan 9 at Bell
Labs.  Uses the system scheduler and creates its own mini-threads (4k).
Need to investiagate the source code on line.  Goroutines do not have OS thread
affinity; they can multiplex over multiple threads.

    -   Java - Early on Java used several versions of Greenlets, but now
uses system threads.  The JVM punts to the OS.

Conclusions
--------------
1.  Do not reinvent the wheel!  Many people have worked decades on this
problem.  Leverage thier expertise.

2.  Coroutines are frequently better than threads, but do scale and each
coroutine must me restarted in the thread where it was spawned.  See
greenlet.c.  Greenlets are also chained and have mutual dependencies.  
The order of execution is arbitrary with not method for priorities.

3.  Investigate if there is an alternative to the current method of calling
external C objects.

4. Dave did a POC on priorities:
http://dabeaz.blogspot.com/2010/02/revisiting-thread-priorities-and-new.html

5.  Everyone agreed that some type of priority mechanism is a good idea, but
wanted to see what Unladen Swallow does. (As of March 2011 Google is no
longer actively developing this project.)


References
- -----------
Dave Beazley - GIL Wars
Dave Beazley - Yieldable Threads http://www.dabeaz.com/blog.html
Linux Kernel http://goo.gl/RkxVs
Erlang
Go - golang.org
CSP - Tony Hoare http://www.usingcsp.com/cspbook.pdf

I spoke with Peter Portante yesterday, and he would be very interested in
participating even though he has very little free time.  Peter works at HP
and worked on their OS threading model.  Also, see his Pycon 2010 talk on
non-blocking IO and the 2011 talk on co-routines.

Let me know if you have any questions.

Bob Hancock

Blog - www.bobhancock.org
Twitter - bob_hancock and nycgtug
------- End of Forwarded Message

And, indeed, Peter Portante is very interested in thinking about doing
without the GIL.  He's already sent me this:

Date:    Sun, 13 Mar 2011 16:42:00 -0400
To:      Laura Creighton <lac at openend.se>
From:    Peter Portante <peter.a.portante at gmail.com>
Subject: Re: [pypy-dev] possibly of use for our documentation

Return-Path: peter.a.portante at gmail.com
Delivery-Date: Sun Mar 13 21:42:21 2011
Return-Path: <peter.a.portante at gmail.com>
Hi Laura,

Just left pycon and heard about talks of pypy removing the gil.

I work on tru64 unix's thread library for 8 years. If there is any thing I 
can do to help with this effort, please let me know.

Thanks,

-peter
-------------------------

Note: I have never promised anybody anything.  This was a 'please
educate me appeal'.  But Bob Hancock is coming back this afternoon
to talk with us.

Anybody got any questions they want to make sure I ask him?

Laura



More information about the Pypy-dev mailing list