[Python-ideas] [Python-Dev] A bit about the GIL

Mon Apr 1 19:46:20 CEST 2013

On Sun, Mar 31, 2013 at 04:14:11PM -0700, Alfredo Solano Mart?nez wrote:
> Hi,
> 
> I know this may be tiresome by now, so feel free to ignore, but I'd like to
> share with the list an idea about the GIL, more specifically the reference
> counting mechanism.

    I've been making pretty good progress with my pyparallel work.  See
    the initial slides here:

        http://speakerdeck.com/trent/parallelizing-the-python-interpreter-an-alternate-approach-to-async

    And follow-up thread starting here:

        http://mail.python.org/pipermail/python-dev/2013-March/124690.html

    I've since set up a separate mailing list for it here:

        http://lists.snakebite.net/mailman/listinfo/pyparallel/

    And switched to bitbucket.org for the primary repo (I still commit
    to hg.python.org/sandbox/trent too, though):

        https://bitbucket.org/tpn/pyparallel

    TL;DR version:

        I've come up with a way to exploit multiple cores without
        impeding the performance of "single-threaded execution",
        and also coupled it with a host of async facilities that
        allow you to write Twisted/Tulip style protocols but have
        callbacks automatically execute across all cores.

        (In the case of client/server facilities, one neat feature
         is that it'll automatically switch between sync and async
         socket methods based on concurrent clients and available
         cores; there is a non-negligible overhead to doing async
         IO versus blocking IO -- if you have 64 cores and only
         32 clients, there's no reason not to attempt sync send
         and recvs; this will maximize throughput.  As soon as the
         client count exceeds available cores, it'll automatically
         do async sends/recvs for everything, improving concurrency
         (at the expense of throughput).  The chargen example in
         the slides is a perfect example of this in action.)

> Simply put, make the reference counter a sharded one. That is, separate it
> into several subcounters, in this case one for each thread.
> 
> The logic would then be something like this:
> - when increasing the refcount, a thread writes only to its own subcounter,
> creating one first if necessary.
> - similarly, when decreasing the refcount, there is no need to access other
> subcounters until that subcounter reaches zero.
> - when a subcounter gets to zero, delete it, and read the other subcounters
> to check if it was the last one.
> - delete the object only if there are no more subcounters.
> 
> Contention could then be reduced to a minimum, since a thread only needs to
> read other subcounters when its own reaches zero or wants the total value.
> Depending on the implementation it might help with false sharing too, as
> subcounters may or may not be in the same cache-line.
> 
> Unfortunately, in a crude test of mine there is already a severe performance
> degradation, and that is without rwlocks. I've used a basic linked list,
> and changed the INCREF/DECREF macros to functions to accommodate the extra
> logic so it may not be the best approach (too many dereferences).
> 
> Does this makes sense to anyone?

    My personal (and extremely biased now that I've gained some momentum
    with pyparallel) opinion is that trying to solve the free threading
    problem is wrong.  That is, allowing existing Python code written to
    use threading.Threads() to execute concurrently across all cores.

    The only way that could ever be achieved is with the introduction of
    fine grained locking or STM-type facilities; both of which seriously
    impede single-threaded performance.

    You might find this interesting, too:

        http://hg.python.org/sandbox/trent.peps/file/6de5ed566af9/pep-async.txt

    I wrote that the weekend before I started actually coding on
    pyparallel.  I think the only person that's seen/read it is Guido
    (and maybe Antoine).  It goes into detail regarding the rationale
    for the design decisions I took (especially re: avoiding fine
    grained locks/STM etc).

        Trent.