[Tim]
For purposes of computational parallelism ... the global interpreter lock> renders Python useless except for prototyping, so there's not much point digging into the hundreds of higher-level parallelism models that have been developed.
[Aahz]
Well, maybe. I'm still hoping to prove you at least partly wrong one of these years. ;-)
WRT higher-level parallelism models, you already have in a small way, by your good championing of the Queue module. Queue-based approaches are a step above the morass of low-level home-grown locking protocols people routinely screw up; it's almost *hard* to screw up a Queue-based approach. The GIL issue is distinct, and it plainly stops computational parallelism from doing any good so long as we're talking about Python code.
(The long-term plan for my BCD module is to turn it into a C extension that releases the GIL.
Well, that's not Python code. It's unclear whether it will actually help: Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS aren't free, and a typical BCD calculation may be so cheap that it's a net loss to release and reacquire the GIL across one. Effective use of fine-grained parallelism usually requires something cheaper to build on, like very lightweight critical sections mediating otherwise free-running threads.
If that's successful, I'll start working on ways to have Numeric release the GIL.)
I expect that's more promising because matrix ops are much coarser-grained, but also much harder to do safely: BCD objects are immutable (IIRC), so a routine crunching one doesn't have to worry about another thread mutating it midstream if the GIL is released. A Numeric array probably does have to worry about that.