[Chicago] Kickstarter Fund to get rid of the GIL

Tal Liron tal.liron at threecrickets.com
Sun Jul 24 22:46:43 CEST 2011

There is an alternative: Jython, which is Python on the JVM, and has no 
GIL. It's real, it works, and has a very open community. If you want to 
do high-concurrency in Python, it's the way to go. (And it has other 
advantages and disadvantages, of course.)

I am always a bit frightened by community attempts to create new virtual 
machines for favorite languages in order to solve problem X. This shows 
a huge under-estimation of what it means to create a robust, reliable, 
performative generic platform. Consider how many really reliable 
versions of the C standard library out there -- and how many decades 
they took to mature, even with thousands of expert eyes poring over the 
code and testing it. And this is without duck typing (or ANY typing), 
data integrity, scoping (+call/cc), tail recursion, or any other of the 
other huge (and exciting) challenges required to run a dynamic language 
like Python.

So, it's almost amusing to see projects like Rubinius or Parrot come to 
be. Really? This is the best use of our time and effort? I'm equally 
impressed by the ballsiness of Erlang to create a new virtual machine 
from scratch.

But those are rather unique histories. CPython has it's own unique 
history. Not many people realize this, but Python is about 6 years older 
than Java, and the JVM would take another decade before reaching 
prominence. JavaScript engines (running in web browsers only) at the 
time were terrible, and Perl was entirely interpreted (no VM). So, in 
fact, CPython was written where there was no really good platform for 
dynamic languages. It wasn't a matter of hubris ("not invented here") to 
build a VM from scratch; there was simply no choice.

Right now, though, there are many good choices. People like Rich Hickey 
(Clojure) and Martin Odersky (Scala) have it right in targeting the JVM, 
although both projects are also exploring .NET/Mono. If Python were 
invented today, I imagine it also would start with "Jython," instead of 
trying to reinvent the wheel (well, reinvent a whole damn car fleet, 
really, in terms of the work required).

One caveat: I think there is room for "meta-VM" projects like PyPy and 
LLVM. These signify a real progress in architecture, whereas "yet 
another dynamic VM" does not.


On 07/24/2011 02:56 PM, Jason Rexilius wrote:

> I also have to quote:
> "rather that, for problems for which shared-memory concurrency is 
> appropriate (read: the valid cases to complain about the GIL), message 
> passing will not be, because of the marshal/unmarshal overhead (plus 
> data size/locality ones)."
> I have to say this is some of the best discussion in quite a while. 
> Dave's passionate response is great as well as others. I think the 
> rudeness, or not, is kinda besides the point.
> There is a valid point to be made about marshal/unmarshal overhead in 
> situations where data-manipulation-concurrency AND _user expectation_ 
> or environmental constraints apply.  I think that's why people have 
> some grounds to be unhappy with the GIL concept (for me its a concept) 
> in certain circumstances. Tal is dead on in that "scalability" means 
> different things.
> Oddly, I'm more engaged in this as an abstract comp sci question than 
> a specific python question.  The problem set applies across languages.
> The question I would raise is if, given that an engineer understands 
> the problem he is facing, are there both tools in the toolbox?  Is 
> there an alternative to GIL for the use-cases where it is not the 
> ideal solution?
> BTW, I will stand up for IPC as one of the tools in the toolbox to 
> deal with scale/volume/speed/concurrency problems.
> On 7/24/11 1:58 PM, Tal Liron wrote:
>> I would say that there's truth in both approaches. "Scalability" means
>> different things at different levels of scale. A web example: the
>> architecture of Twitter or Facebook is nothing like the architecture of
>> even a large Django site. It's not even the same problem field.
>> A good threading model can be extremely efficient at certain scales. For
>> data structures that are mostly read, not written, synchronization is
>> not a performance issue, and you get the best throughput possible in
>> multicore situations. The truly best scalability would be achieved by a
>> combined approach: threading on a single node, message passing between
>> nodes. Programming for that, though, is a nightmare (unless you had a
>> programming language that makes both approaches transparent) and so
>> usually at the large scale the latter approach is chosen. One
>> significant challenge is to make sure that operations that MIGHT use the
>> same data structures are actually performed on the same node, so that
>> threading would be put to use.
>> So, what Dave said applies very well to threading, too: "you still need
>> to know what you're doing and how to decompose your application to 
>> use it."
>> Doing concurrency right is hard. Doing message passing right is hard.
>> Functional (persistent data structure) languages are hard, too. Good
>> thing we're all such awesome geniuses, bursting with experience and a
>> desire to learn.
>> -Tal
>> On 07/23/2011 01:40 PM, David Beazley wrote:
>>>> "high performance just create multi processes that message" very
>>>> rarely have
>>>> I heard IPC and high performance in the same sentence.
>>>> Alex
>>> Your youth and inexperience is the only reason would make a statement
>>> that ignorant. Go hang out with some people doing Python and
>>> supercomputing for awhile and report back---you will find that almost
>>> significant application is based on message passing (e.g., MPI). This
>>> is because message passing has proven itself to be about the only sane
>>> way of scaling applications up to run across thousands to tens of
>>> thousands of CPU cores.
>>> I speak from some experience as I was writing such software for large
>>> Crays, Connection Machines, and other systems when I first discovered
>>> Python back in 1996. As early as 1995, our group had done performance
>>> experiments comparing threads vs. message passing on some
>>> multiprocessor SMP systems and found that threads just didn't scale or
>>> perform as well as message passing even on machines with as few as 4
>>> CPUs. This was all highly optimized C code for numerics (i.e., no
>>> Python or GIL).
>>> That said, in order to code with message passing, you still need to
>>> know what you're doing and how to decompose your application to use it.
>>> Cheers,
>>> Dave
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> http://mail.python.org/mailman/listinfo/chicago
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago

More information about the Chicago mailing list