[Python-3000] Kill GIL?

Mon Sep 18 08:29:44 CEST 2006

On 9/17/06, Ivan Krstić <krstic at solarsail.hcs.harvard.edu> wrote:
> Andre Meyer wrote:
> > As a heavy user of multi-threading in Python and following the current
> > discussions about Python on multi-processor systems on the python-list I
> > wonder what the plans are for improving MP performance in Py3k.
>
> I have four aborted e-mails in my 'Drafts' folder that are asking the
> same question; each time, I decided that the almost inevitably ensuing
> "threads suck!" flamewar just isn't worth it. Now that someone else has
> taken the plunge...
>
> At present, the Python approach to multi-processing sounds a bit like
> "let's stick our collective hands in the sand and pretend there's no
> problem". In particular, one oft-parroted argument says that it's not
> worth changing or optimizing the language for the few people who can
> afford SMP hardware. In the meantime, dual-core laptops are becoming the
> standard, with Intel predicting quad-core will become mainstream in the
> next few years, and the number of server orders for single-core, UP
> machines is plummeting.
>
> From this, it's obvious to me that we need to do *something* to
> introduce stronger multi-processing support. Our current abilities are
> rather bad: we offer no microthreads, which is making elegant
> concurrency primitives such as Erlang's, ported to Python by the
> Candygram project [0], unnecessarily expensive. Instead, we only offer
> heavy threads that each allocate a full-size stack, and there's no
> actual ability to parallelize thread execution across CPUs. There's also
> no way to simply fork and coordinate between the forked processes,
> depending on the nature of the problem being solved, since there's no
> shared memory primitive in the stdlib (this because shared memory
> semantics are notoriously different across platforms). On top of it all,
> any adopted solution needs to be implementable across all the major
> Python interpreters, which makes finding a solution that much harder.

Candygram is heavyweight by trade-off, not because it has to be.
Candygram could absolutely be implemented efficiently in current
Python if a Twisted-like style was used. An API that exploits Python
2.5's with blocks and enhanced iterators would make it less verbose
than a traditional twisted app and potentially easier to learn.
Stackless or greenlets could be used for an even lighter weight API,
though not as portably.

> The way I see it, we have several options:
>
> * Bite the bullet; write and support a stdlib SHM primitive that works
> wherever possible, and simply doesn't work on completely broken
> platforms (I understand Windows falls into this category). Utilize it in
> a lightweight fork-and-coordinate wrapper provided in the stdlib.

I really don't think that's the right approach. If we're going to
bother supporting distributed processing, we might as well support it
in a portable way that can scale across machines.

> * Bite the mortar shell, and remove the GIL.

This really isn't even an option because we're not throwing away the
current C Python implementation. The C API would have to change quite
a bit for that.

> * Introduce microthreads, declare that Python endorses Erlang's
> no-sharing approach to concurrency, and incorporate something like
> candygram into the stdlib.

We have cooperatively scheduled microthreads with ugly syntax (yield),
or more platform-specific and much less debuggable microthreads with
stackless or greenlets.

The missing part is the async message passing API and the libraries to
go with it. Erlang uses something a lot like pickle for this, but
Erlang only has about 8 types that are all immutable (IIRC: function,
binary, list, tuple, pid, atom, integer, float). Communication between
Erlang nodes requires a cookie (shared secret), which skirts around
security issues. You can definitely kill an Erlang node if you have
its cookie by flooding the atom table (atoms are like interned
strings), but that's not considered to be a problem in most deployment
scenarios.

> * Introduce a fork-and-coordinate wrapper in the stdlib, and declare
> that we're simply not going to support the use case that requires
> sharing (as opposed to merely passing) objects between processes.

What use case *requires* sharing? In a message passing system, usage
of shared memory is an optimization that you shouldn't care much about
as a user. Also, sockets are generally very fast over loopback.

IIRC, Erlang only does this with binaries > 64 bytes long across
processes on the same node (same pid, but not necessarily the same
pthread in an SMP build). HiPE might do some more aggressive
communication optimizations... but I think the general idea is that
sending a really big message to another process is probably the wrong
thing to do anyway.

-bob