[Python-ideas] [Python-Dev] A bit about the GIL
Trent Nelson
trent at snakebite.org
Mon Apr 1 19:46:20 CEST 2013
On Sun, Mar 31, 2013 at 04:14:11PM -0700, Alfredo Solano Mart?nez wrote:
> Hi,
>
> I know this may be tiresome by now, so feel free to ignore, but I'd like to
> share with the list an idea about the GIL, more specifically the reference
> counting mechanism.
I've been making pretty good progress with my pyparallel work. See
the initial slides here:
http://speakerdeck.com/trent/parallelizing-the-python-interpreter-an-alternate-approach-to-async
And follow-up thread starting here:
http://mail.python.org/pipermail/python-dev/2013-March/124690.html
I've since set up a separate mailing list for it here:
http://lists.snakebite.net/mailman/listinfo/pyparallel/
And switched to bitbucket.org for the primary repo (I still commit
to hg.python.org/sandbox/trent too, though):
https://bitbucket.org/tpn/pyparallel
TL;DR version:
I've come up with a way to exploit multiple cores without
impeding the performance of "single-threaded execution",
and also coupled it with a host of async facilities that
allow you to write Twisted/Tulip style protocols but have
callbacks automatically execute across all cores.
(In the case of client/server facilities, one neat feature
is that it'll automatically switch between sync and async
socket methods based on concurrent clients and available
cores; there is a non-negligible overhead to doing async
IO versus blocking IO -- if you have 64 cores and only
32 clients, there's no reason not to attempt sync send
and recvs; this will maximize throughput. As soon as the
client count exceeds available cores, it'll automatically
do async sends/recvs for everything, improving concurrency
(at the expense of throughput). The chargen example in
the slides is a perfect example of this in action.)
> Simply put, make the reference counter a sharded one. That is, separate it
> into several subcounters, in this case one for each thread.
>
> The logic would then be something like this:
> - when increasing the refcount, a thread writes only to its own subcounter,
> creating one first if necessary.
> - similarly, when decreasing the refcount, there is no need to access other
> subcounters until that subcounter reaches zero.
> - when a subcounter gets to zero, delete it, and read the other subcounters
> to check if it was the last one.
> - delete the object only if there are no more subcounters.
>
> Contention could then be reduced to a minimum, since a thread only needs to
> read other subcounters when its own reaches zero or wants the total value.
> Depending on the implementation it might help with false sharing too, as
> subcounters may or may not be in the same cache-line.
>
> Unfortunately, in a crude test of mine there is already a severe performance
> degradation, and that is without rwlocks. I've used a basic linked list,
> and changed the INCREF/DECREF macros to functions to accommodate the extra
> logic so it may not be the best approach (too many dereferences).
>
> Does this makes sense to anyone?
My personal (and extremely biased now that I've gained some momentum
with pyparallel) opinion is that trying to solve the free threading
problem is wrong. That is, allowing existing Python code written to
use threading.Threads() to execute concurrently across all cores.
The only way that could ever be achieved is with the introduction of
fine grained locking or STM-type facilities; both of which seriously
impede single-threaded performance.
You might find this interesting, too:
http://hg.python.org/sandbox/trent.peps/file/6de5ed566af9/pep-async.txt
I wrote that the weekend before I started actually coding on
pyparallel. I think the only person that's seen/read it is Guido
(and maybe Antoine). It goes into detail regarding the rationale
for the design decisions I took (especially re: avoiding fine
grained locks/STM etc).
Trent.
More information about the Python-ideas
mailing list