[Chicago] Kickstarter Fund to get rid of the GIL

Sun Jul 24 23:32:13 CEST 2011

At least erlang works for the use cases. I wasn't aware that Jython
was that powerful I will have to play with it.

On Sun, Jul 24, 2011 at 3:46 PM, Tal Liron <tal.liron at threecrickets.com> wrote:
> There is an alternative: Jython, which is Python on the JVM, and has no GIL.
> It's real, it works, and has a very open community. If you want to do
> high-concurrency in Python, it's the way to go. (And it has other advantages
> and disadvantages, of course.)
>
>
> I am always a bit frightened by community attempts to create new virtual
> machines for favorite languages in order to solve problem X. This shows a
> huge under-estimation of what it means to create a robust, reliable,
> performative generic platform. Consider how many really reliable versions of
> the C standard library out there -- and how many decades they took to
> mature, even with thousands of expert eyes poring over the code and testing
> it. And this is without duck typing (or ANY typing), data integrity, scoping
> (+call/cc), tail recursion, or any other of the other huge (and exciting)
> challenges required to run a dynamic language like Python.
>
>
> So, it's almost amusing to see projects like Rubinius or Parrot come to be.
> Really? This is the best use of our time and effort? I'm equally impressed
> by the ballsiness of Erlang to create a new virtual machine from scratch.
>
>
> But those are rather unique histories. CPython has it's own unique history.
> Not many people realize this, but Python is about 6 years older than Java,
> and the JVM would take another decade before reaching prominence. JavaScript
> engines (running in web browsers only) at the time were terrible, and Perl
> was entirely interpreted (no VM). So, in fact, CPython was written where
> there was no really good platform for dynamic languages. It wasn't a matter
> of hubris ("not invented here") to build a VM from scratch; there was simply
> no choice.
>
>
> Right now, though, there are many good choices. People like Rich Hickey
> (Clojure) and Martin Odersky (Scala) have it right in targeting the JVM,
> although both projects are also exploring .NET/Mono. If Python were invented
> today, I imagine it also would start with "Jython," instead of trying to
> reinvent the wheel (well, reinvent a whole damn car fleet, really, in terms
> of the work required).
>
>
> One caveat: I think there is room for "meta-VM" projects like PyPy and LLVM.
> These signify a real progress in architecture, whereas "yet another dynamic
> VM" does not.
>
>
> -Tal
>
>
> On 07/24/2011 02:56 PM, Jason Rexilius wrote:
>
>> I also have to quote:
>>
>> "rather that, for problems for which shared-memory concurrency is
>> appropriate (read: the valid cases to complain about the GIL), message
>> passing will not be, because of the marshal/unmarshal overhead (plus data
>> size/locality ones)."
>>
>>
>> I have to say this is some of the best discussion in quite a while. Dave's
>> passionate response is great as well as others. I think the rudeness, or
>> not, is kinda besides the point.
>>
>> There is a valid point to be made about marshal/unmarshal overhead in
>> situations where data-manipulation-concurrency AND _user expectation_ or
>> environmental constraints apply.  I think that's why people have some
>> grounds to be unhappy with the GIL concept (for me its a concept) in certain
>> circumstances. Tal is dead on in that "scalability" means different things.
>>
>> Oddly, I'm more engaged in this as an abstract comp sci question than a
>> specific python question.  The problem set applies across languages.
>>
>> The question I would raise is if, given that an engineer understands the
>> problem he is facing, are there both tools in the toolbox?  Is there an
>> alternative to GIL for the use-cases where it is not the ideal solution?
>>
>> BTW, I will stand up for IPC as one of the tools in the toolbox to deal
>> with scale/volume/speed/concurrency problems.
>>
>>
>> On 7/24/11 1:58 PM, Tal Liron wrote:
>>>
>>> I would say that there's truth in both approaches. "Scalability" means
>>> different things at different levels of scale. A web example: the
>>> architecture of Twitter or Facebook is nothing like the architecture of
>>> even a large Django site. It's not even the same problem field.
>>>
>>>
>>> A good threading model can be extremely efficient at certain scales. For
>>> data structures that are mostly read, not written, synchronization is
>>> not a performance issue, and you get the best throughput possible in
>>> multicore situations. The truly best scalability would be achieved by a
>>> combined approach: threading on a single node, message passing between
>>> nodes. Programming for that, though, is a nightmare (unless you had a
>>> programming language that makes both approaches transparent) and so
>>> usually at the large scale the latter approach is chosen. One
>>> significant challenge is to make sure that operations that MIGHT use the
>>> same data structures are actually performed on the same node, so that
>>> threading would be put to use.
>>>
>>>
>>> So, what Dave said applies very well to threading, too: "you still need
>>> to know what you're doing and how to decompose your application to use
>>> it."
>>>
>>>
>>> Doing concurrency right is hard. Doing message passing right is hard.
>>> Functional (persistent data structure) languages are hard, too. Good
>>> thing we're all such awesome geniuses, bursting with experience and a
>>> desire to learn.
>>>
>>>
>>> -Tal
>>>
>>>
>>> On 07/23/2011 01:40 PM, David Beazley wrote:
>>>
>>>>> "high performance just create multi processes that message" very
>>>>> rarely have
>>>>> I heard IPC and high performance in the same sentence.
>>>>>
>>>>> Alex
>>>>>
>>>> Your youth and inexperience is the only reason would make a statement
>>>> that ignorant. Go hang out with some people doing Python and
>>>> supercomputing for awhile and report back---you will find that almost
>>>> significant application is based on message passing (e.g., MPI). This
>>>> is because message passing has proven itself to be about the only sane
>>>> way of scaling applications up to run across thousands to tens of
>>>> thousands of CPU cores.
>>>>
>>>> I speak from some experience as I was writing such software for large
>>>> Crays, Connection Machines, and other systems when I first discovered
>>>> Python back in 1996. As early as 1995, our group had done performance
>>>> experiments comparing threads vs. message passing on some
>>>> multiprocessor SMP systems and found that threads just didn't scale or
>>>> perform as well as message passing even on machines with as few as 4
>>>> CPUs. This was all highly optimized C code for numerics (i.e., no
>>>> Python or GIL).
>>>>
>>>> That said, in order to code with message passing, you still need to
>>>> know what you're doing and how to decompose your application to use it.
>>>>
>>>> Cheers,
>>>> Dave
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Chicago mailing list
>>>> Chicago at python.org
>>>> http://mail.python.org/mailman/listinfo/chicago
>>>
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> http://mail.python.org/mailman/listinfo/chicago
>>
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>