[Python-3000] Kill GIL?

Mon Sep 18 09:38:38 CEST 2006

Bob Ippolito wrote:
> Candygram is heavyweight by trade-off, not because it has to be.
> Candygram could absolutely be implemented efficiently in current
> Python if a Twisted-like style was used. 

Specifically?

>> * Bite the bullet; write and support a stdlib SHM primitive that works [..]
>> a lightweight fork-and-coordinate wrapper provided in the stdlib.
> 
> I really don't think that's the right approach. If we're going to
> bother supporting distributed processing, we might as well support it
> in a portable way that can scale across machines.

Fork-and-coordinate is a specialized case of distribute-and-coordinate.
Other d-a-c mechanisms can be provided, including those that utilize
some form of RPC as a transport. SHM is orthogonal to all of this.

Note that scaling across machines is only equivalent to scaling across
CPUs in the simple case; in more complicated cases, there's a lot of
glue involved that grid frameworks like Boinc provide. If we end up
shipping any cross-machine abilities in the stdlib, we'd have to make
sure it's clear that we're not attempting to provide a grid framework,
just the plumbing that someone could use to build one.

>> * Bite the mortar shell, and remove the GIL.
> 
> This really isn't even an option because we're not throwing away the
> current C Python implementation. The C API would have to change quite
> a bit for that.

Hence 'mortar shell'. It can be done, but I think Guido's been pretty
clear on it not happening anytime soon.

> We have cooperatively scheduled microthreads with ugly syntax (yield),
> or more platform-specific and much less debuggable microthreads with
> stackless or greenlets.

Right. This is why I'm not sure we want to be recommending either as
`the` Python way to do concurrency.

> What use case *requires* sharing? 

Strictly speaking, it's always avoidable. But in setup-heavy systems,
avoiding SHM is a massive and costly pain. Consider web applications;
ideally, you can preload one copy of all of your translations, database
information, and other static information, into RAM -- and have worker
threads do reads from this table as they're processing individual
requests. Without SHM, you'd have to either duplicate the static set in
memory for each CPU, or make individual requests for each desired piece
of information to the master process that keeps the static set in RAM.

I've seen a number of computationally-bound systems that require an
authoritative copy of a (large) dataset in RAM, and are OK with paying
the cost of a read waiting on a lock during a write (and since writes
only happen at the completion of complex calculations, they generally
want to use locking like that provided by brlocks in the Linux kernel).
All of this is workable without SHM, but some of it gets really unwieldy.

-- 
Ivan Krstić <krstic at solarsail.hcs.harvard.edu> | GPG: 0x147C722D