[Python-ideas] [Python-Dev] A bit about the GIL
Trent Nelson
trent at snakebite.org
Wed Apr 3 11:49:43 CEST 2013
On Tue, Apr 02, 2013 at 05:40:07PM -0700, Alfredo Solano MartÃnez wrote:
> > (There are two aspects to the work; the parallel stuff, which is the
> > changes to the interpreter to allow multiple threads to run CPython
> > internals concurrently, and the async stuff, which will be heavily
> > tied to the best IO multiplex option on the underlying platform
> > (IOCP on AIX, event ports on Solaris, kqueue on *BSD, epoll on
> > Linux, poll on everything else). The parallel stuff is pretty
> > platform agnostic, which is nice. (Aside from the thread/register
> > trick; but it appears as though most contemporary ISAs have some
> > way of doing the same thing.))
>
> That's a lot of things to do. Do you have a work breakdown structure or
> are you still putting the pieces together?
Work breakdown structure? That's far too organized ;-) I have an
end goal in mind and I'm just slowly hacking my way towards it (at
least for the Windows work).
> > The "no refcounting and nuke everything when done" aspect has
> > worked surprisingly well. Shared-nothing code executing in a
> > parallel thread absolutely flies. Mallocs are basically free,
> > frees are no-ops, no reference counting and no garbage
> > collection; everything gets released in a single call when we're
> > done.
>
> Glad to hear it, it's hard to make things simple. Actually, I have to
> say the GPU analogy is very good, with all but the main core acting as
> vector processors -and thus providing a sort of programmable pipeline
> for it- while the main core becomes the CPU. I would go definitely
> for that in future slides.
The GPU analogy seemed like a good idea when I was writing the PEP,
but the implementation has taken a slightly different path. There
is far less emphasis on the notion of vectorized/SIMD-style work; in
fact, I haven't implemented any of the 'parallel' type functions yet
(like a parallel map/reduce, or equivalents to the parallel stuff
exposed by multiprocessing).
That'll be all stuff to tackle down the track.
> In the case of the GPUs the copying of data from memory to card is
> usually a bottleneck, is there a big hit in performance here too?
Well, as the current implementation doesn't really have anything
that reflects the GPU vector analogy in that draft PEP, no, not
really ;-)
(I should probably clarify again that the PEP I cited was hacked out
in a weekend before I started a lick of coding. The requirements
section is definitely useful, as it elicits the constraints I used
to drive my design decisions, but all of the sections that allude
to implementation details (like binding a thread to each core via
thread affinity, not having access to globals, introducing new op-
codes to achieve the parallel functionality) don't necessarily map
to how I've implemented things now. Once I've finished the work on
Windows I'll do an updated PEP.)
Trent.
More information about the Python-ideas
mailing list