[Python-ideas] [Python-Dev] A bit about the GIL

Wed Apr 3 11:49:43 CEST 2013

On Tue, Apr 02, 2013 at 05:40:07PM -0700, Alfredo Solano Martínez wrote:
> >     (There are two aspects to the work; the parallel stuff, which is the
> >      changes to the interpreter to allow multiple threads to run CPython
> >      internals concurrently, and the async stuff, which will be heavily
> >      tied to the best IO multiplex option on the underlying platform
> >      (IOCP on AIX, event ports on Solaris, kqueue on *BSD, epoll on
> >      Linux, poll on everything else).  The parallel stuff is pretty
> >      platform agnostic, which is nice.  (Aside from the thread/register
> >      trick; but it appears as though most contemporary ISAs have some
> >      way of doing the same thing.))
> 
> That's a lot of things to do. Do you have a work breakdown structure or
> are you still putting the pieces together?

    Work breakdown structure?  That's far too organized ;-)  I have an
    end goal in mind and I'm just slowly hacking my way towards it (at
    least for the Windows work).

> >     The "no refcounting and nuke everything when done" aspect has
> >     worked surprisingly well.  Shared-nothing code executing in a
> >     parallel thread absolutely flies.  Mallocs are basically free,
> >     frees are no-ops, no reference counting and no garbage
> >     collection; everything gets released in a single call when we're
> >     done.
> 
> Glad to hear it, it's hard to make things simple. Actually, I have to
> say the GPU analogy is very good, with all but the main core acting as
> vector processors -and thus providing a sort of programmable pipeline
> for it-  while the main core becomes the CPU. I would go definitely
> for that in future slides.

    The GPU analogy seemed like a good idea when I was writing the PEP,
    but the implementation has taken a slightly different path.  There
    is far less emphasis on the notion of vectorized/SIMD-style work; in
    fact, I haven't implemented any of the 'parallel' type functions yet
    (like a parallel map/reduce, or equivalents to the parallel stuff
    exposed by multiprocessing).

    That'll be all stuff to tackle down the track.


> In the case of the GPUs the copying of data from memory to card is
> usually a bottleneck, is there a big hit in performance here too?

    Well, as the current implementation doesn't really have anything
    that reflects the GPU vector analogy in that draft PEP, no, not
    really ;-)

    (I should probably clarify again that the PEP I cited was hacked out
     in a weekend before I started a lick of coding.  The requirements
     section is definitely useful, as it elicits the constraints I used
     to drive my design decisions, but all of the sections that allude
     to implementation details (like binding a thread to each core via
     thread affinity, not having access to globals, introducing new op-
     codes to achieve the parallel functionality) don't necessarily map
     to how I've implemented things now.  Once I've finished the work on
     Windows I'll do an updated PEP.)

        Trent.