[Chicago] Kickstarter Fund to get rid of the GIL

Brian Herman brianherman at gmail.com
Mon Jul 25 05:13:22 CEST 2011


+1 for PYPY


Arigatou gozaimasu,
(Thank you very much)
Brian Herman

brianjherman.com
brianherman at acm.org








On Sun, Jul 24, 2011 at 7:57 PM, Alex Gaynor <alex.gaynor at gmail.com> wrote:

>
>
> On Sun, Jul 24, 2011 at 5:38 PM, Tal Liron <tal.liron at threecrickets.com>wrote:
>
>> JVM 7 will have some neat features, but they haven't been stabilized yet,
>> and at this point it's mostly experimentation. Fact is, even though JVM 6
>> has been out for a few years already, many deployments still stick to JVM 5.
>> It does the job, and "upgrades" have their costs, money and otherwise. I
>> choose JVM for my project not because of speed, but because of the maturity
>> of the platform, which includes administration tools, monitoring, security,
>> and several best-in-class 3rd party libraries. It's nice to know that
>> performance is very high up there if I really need it (at which case I just
>> "drop down" to Java, rather than use a dynamic JVM language).
>>
>>
>> The whole Jython codebase could use some help... it's even messier than
>> CPython's, if that's possible. There's a lot of room for optimization, even
>> before igniting JVM 7 shortcuts, though it will surely be at the cost of
>> regressions and stability.Luckily, there's a decent test suite, which makes
>> it easy to experiment for. The Jython community would LOVE help, and it
>> doesn't have to be just in terms of coding. Their recent big project was to
>> move the whole codebase from Subversion to Mercurial. Another big item on
>> the todo list is to get up to date with Python 3. (Jython = Python 2.5
>> formally, though it has quite a few 2.6 additions.)
>>
>>
>> Jython also has some nice collaboration with JRuby, including people who
>> work on both projects. But, what I would make me happier is if there was
>> real code sharing, allowing for a dynamic core that would work well for both
>> projects.
>>
>>
>> Anyway. I guess I'm always confused by what people mean by "faster." What
>> are you trying to code for, exactly? Where is your bottleneck? What is your
>> funding? It's more likely that (although not necessarily) what you really
>> are looking for is "scalability," for which shear computational performance
>> is likely not the real issue. If money is coming, getting more expensive,
>> faster machines may do the trick better than any JVM 7 optimization.
>>
>>
>> If you just want a command line tool that starts fast, JVM is *not* where
>> you want to go. It has notoriously slow startup, for exactly those
>> mechanisms that make it perform so well as it runs.
>>
>>
>> Another way to look at "faster" is as a way to save money. Weird, huh? But
>> consider Facebook's HipHop project. (Sorry that all of my examples are from
>> the web arena; it's where I mostly work these days.) The issue was not that
>> PHP was "slow," it was that when you have 1,000 machines running at 90% CPU,
>> a faster PHP runtime means that you can use 800 machines, instead, for the
>> same workload. A few orders of magnitude forward, and savings can be
>> enormous.
>>
>>
>> If you have a project with 1,000 machines running at 90% CPU, please hire
>> me! It may be very worthwhile for you to create a more performant Python
>> runtime (JVM-based or not), and I'd love to be paid to do that. :) And it
>> would also make a lot of irrational Python speed freaks happy.
>>
>>
>> -Tal
>>
>>
> <minor derail>
> No offense, but if you want a more performant Python runtime, it's here
> today: http://speed.pypy.org/, no need to start from scratch.
> </minor derail>
>
> Alex
>
>
>>
>>
>> On 07/24/2011 06:18 PM, John Stoner wrote:
>>
>>  Jython's not bad. I've used it a lot, and it plays well with lots of Java
>>> APIs. Pretty slick, actually. I hear Java 1.7 has some new dynamic features
>>> at the JVM level. I always imagined Jython would run a lot faster if it took
>>> advantage of them. Tal, do you know if there's any work on that? Googling
>>> around a bit I'm not seeing much.
>>>
>>> On Sun, Jul 24, 2011 at 4:32 PM, Joshua Herman <zitterbewegung at gmail.com<mailto:
>>> zitterbewegung at gmail.**com <zitterbewegung at gmail.com>>> wrote:
>>>
>>>    At least erlang works for the use cases. I wasn't aware that Jython
>>>    was that powerful I will have to play with it.
>>>
>>>    On Sun, Jul 24, 2011 at 3:46 PM, Tal Liron
>>>    <tal.liron at threecrickets.com <mailto:tal.liron@**threecrickets.com<tal.liron at threecrickets.com>
>>> >>
>>>
>>>    wrote:
>>>    > There is an alternative: Jython, which is Python on the JVM, and
>>>    has no GIL.
>>>    > It's real, it works, and has a very open community. If you want
>>>    to do
>>>    > high-concurrency in Python, it's the way to go. (And it has
>>>    other advantages
>>>    > and disadvantages, of course.)
>>>    >
>>>    >
>>>    > I am always a bit frightened by community attempts to create new
>>>    virtual
>>>    > machines for favorite languages in order to solve problem X.
>>>    This shows a
>>>    > huge under-estimation of what it means to create a robust, reliable,
>>>    > performative generic platform. Consider how many really reliable
>>>    versions of
>>>    > the C standard library out there -- and how many decades they
>>>    took to
>>>    > mature, even with thousands of expert eyes poring over the code
>>>    and testing
>>>    > it. And this is without duck typing (or ANY typing), data
>>>    integrity, scoping
>>>    > (+call/cc), tail recursion, or any other of the other huge (and
>>>    exciting)
>>>    > challenges required to run a dynamic language like Python.
>>>    >
>>>    >
>>>    > So, it's almost amusing to see projects like Rubinius or Parrot
>>>    come to be.
>>>    > Really? This is the best use of our time and effort? I'm equally
>>>    impressed
>>>    > by the ballsiness of Erlang to create a new virtual machine from
>>>    scratch.
>>>    >
>>>    >
>>>    > But those are rather unique histories. CPython has it's own
>>>    unique history.
>>>    > Not many people realize this, but Python is about 6 years older
>>>    than Java,
>>>    > and the JVM would take another decade before reaching
>>>    prominence. JavaScript
>>>    > engines (running in web browsers only) at the time were
>>>    terrible, and Perl
>>>    > was entirely interpreted (no VM). So, in fact, CPython was
>>>    written where
>>>    > there was no really good platform for dynamic languages. It
>>>    wasn't a matter
>>>    > of hubris ("not invented here") to build a VM from scratch;
>>>    there was simply
>>>    > no choice.
>>>    >
>>>    >
>>>    > Right now, though, there are many good choices. People like Rich
>>>    Hickey
>>>    > (Clojure) and Martin Odersky (Scala) have it right in targeting
>>>    the JVM,
>>>    > although both projects are also exploring .NET/Mono. If Python
>>>    were invented
>>>    > today, I imagine it also would start with "Jython," instead of
>>>    trying to
>>>    > reinvent the wheel (well, reinvent a whole damn car fleet,
>>>    really, in terms
>>>    > of the work required).
>>>    >
>>>    >
>>>    > One caveat: I think there is room for "meta-VM" projects like
>>>    PyPy and LLVM.
>>>    > These signify a real progress in architecture, whereas "yet
>>>    another dynamic
>>>    > VM" does not.
>>>    >
>>>    >
>>>    > -Tal
>>>    >
>>>    >
>>>    > On 07/24/2011 02:56 PM, Jason Rexilius wrote:
>>>    >
>>>    >> I also have to quote:
>>>    >>
>>>    >> "rather that, for problems for which shared-memory concurrency is
>>>    >> appropriate (read: the valid cases to complain about the GIL),
>>>    message
>>>    >> passing will not be, because of the marshal/unmarshal overhead
>>>    (plus data
>>>    >> size/locality ones)."
>>>    >>
>>>    >>
>>>    >> I have to say this is some of the best discussion in quite a
>>>    while. Dave's
>>>    >> passionate response is great as well as others. I think the
>>>    rudeness, or
>>>    >> not, is kinda besides the point.
>>>    >>
>>>    >> There is a valid point to be made about marshal/unmarshal
>>>    overhead in
>>>    >> situations where data-manipulation-concurrency AND _user
>>>    expectation_ or
>>>    >> environmental constraints apply.  I think that's why people
>>>    have some
>>>    >> grounds to be unhappy with the GIL concept (for me its a
>>>    concept) in certain
>>>    >> circumstances. Tal is dead on in that "scalability" means
>>>    different things.
>>>    >>
>>>    >> Oddly, I'm more engaged in this as an abstract comp sci
>>>    question than a
>>>    >> specific python question.  The problem set applies across
>>>    languages.
>>>    >>
>>>    >> The question I would raise is if, given that an engineer
>>>    understands the
>>>    >> problem he is facing, are there both tools in the toolbox?  Is
>>>    there an
>>>    >> alternative to GIL for the use-cases where it is not the ideal
>>>    solution?
>>>    >>
>>>    >> BTW, I will stand up for IPC as one of the tools in the toolbox
>>>    to deal
>>>    >> with scale/volume/speed/concurrency problems.
>>>    >>
>>>    >>
>>>    >> On 7/24/11 1:58 PM, Tal Liron wrote:
>>>    >>>
>>>    >>> I would say that there's truth in both approaches.
>>>    "Scalability" means
>>>    >>> different things at different levels of scale. A web example: the
>>>    >>> architecture of Twitter or Facebook is nothing like the
>>>    architecture of
>>>    >>> even a large Django site. It's not even the same problem field.
>>>    >>>
>>>    >>>
>>>    >>> A good threading model can be extremely efficient at certain
>>>    scales. For
>>>    >>> data structures that are mostly read, not written,
>>>    synchronization is
>>>    >>> not a performance issue, and you get the best throughput
>>>    possible in
>>>    >>> multicore situations. The truly best scalability would be
>>>    achieved by a
>>>    >>> combined approach: threading on a single node, message passing
>>>    between
>>>    >>> nodes. Programming for that, though, is a nightmare (unless
>>>    you had a
>>>    >>> programming language that makes both approaches transparent)
>>>    and so
>>>    >>> usually at the large scale the latter approach is chosen. One
>>>    >>> significant challenge is to make sure that operations that
>>>    MIGHT use the
>>>    >>> same data structures are actually performed on the same node,
>>>    so that
>>>    >>> threading would be put to use.
>>>    >>>
>>>    >>>
>>>    >>> So, what Dave said applies very well to threading, too: "you
>>>    still need
>>>    >>> to know what you're doing and how to decompose your
>>>    application to use
>>>    >>> it."
>>>    >>>
>>>    >>>
>>>    >>> Doing concurrency right is hard. Doing message passing right
>>>    is hard.
>>>    >>> Functional (persistent data structure) languages are hard,
>>>    too. Good
>>>    >>> thing we're all such awesome geniuses, bursting with
>>>    experience and a
>>>    >>> desire to learn.
>>>    >>>
>>>    >>>
>>>    >>> -Tal
>>>    >>>
>>>    >>>
>>>    >>> On 07/23/2011 01:40 PM, David Beazley wrote:
>>>    >>>
>>>    >>>>> "high performance just create multi processes that message" very
>>>    >>>>> rarely have
>>>    >>>>> I heard IPC and high performance in the same sentence.
>>>    >>>>>
>>>    >>>>> Alex
>>>    >>>>>
>>>    >>>> Your youth and inexperience is the only reason would make a
>>>    statement
>>>    >>>> that ignorant. Go hang out with some people doing Python and
>>>    >>>> supercomputing for awhile and report back---you will find
>>>    that almost
>>>    >>>> significant application is based on message passing (e.g.,
>>>    MPI). This
>>>    >>>> is because message passing has proven itself to be about the
>>>    only sane
>>>    >>>> way of scaling applications up to run across thousands to tens of
>>>    >>>> thousands of CPU cores.
>>>    >>>>
>>>    >>>> I speak from some experience as I was writing such software
>>>    for large
>>>    >>>> Crays, Connection Machines, and other systems when I first
>>>    discovered
>>>    >>>> Python back in 1996. As early as 1995, our group had done
>>>    performance
>>>    >>>> experiments comparing threads vs. message passing on some
>>>    >>>> multiprocessor SMP systems and found that threads just didn't
>>>    scale or
>>>    >>>> perform as well as message passing even on machines with as
>>>    few as 4
>>>    >>>> CPUs. This was all highly optimized C code for numerics (i.e., no
>>>    >>>> Python or GIL).
>>>    >>>>
>>>    >>>> That said, in order to code with message passing, you still
>>>    need to
>>>    >>>> know what you're doing and how to decompose your application
>>>    to use it.
>>>    >>>>
>>>    >>>> Cheers,
>>>    >>>> Dave
>>>    >>>>
>>>    >>>>
>>>    >>>>
>>>    >>>>
>>>    >>>>
>>>    >>>>
>>>    >>>>
>>>    >>>>
>>>    >>>> ______________________________**_________________
>>>    >>>> Chicago mailing list
>>>    >>>> Chicago at python.org <mailto:Chicago at python.org>
>>>
>>>    >>>> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>>    >>>
>>>    >>> ______________________________**_________________
>>>    >>> Chicago mailing list
>>>    >>> Chicago at python.org <mailto:Chicago at python.org>
>>>
>>>    >>> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>>    >>
>>>    >> ______________________________**_________________
>>>    >> Chicago mailing list
>>>    >> Chicago at python.org <mailto:Chicago at python.org>
>>>
>>>    >> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>>    >
>>>    > ______________________________**_________________
>>>    > Chicago mailing list
>>>    > Chicago at python.org <mailto:Chicago at python.org>
>>>
>>>    > http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>>    >
>>>    ______________________________**_________________
>>>    Chicago mailing list
>>>    Chicago at python.org <mailto:Chicago at python.org>
>>>
>>>    http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>>
>>>
>>>
>>>
>>> --
>>> blogs:
>>> http://johnstoner.wordpress.**com/ <http://johnstoner.wordpress.com/>
>>> 'In knowledge is power; in  wisdom, humility.'
>>>
>>>
>>> ______________________________**_________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>>
>>
>> ______________________________**_________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>>
>
>
>
> --
> "I disapprove of what you say, but I will defend to the death your right to
> say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
> "The people's good is the highest law." -- Cicero
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20110724/0eff1917/attachment-0001.html>


More information about the Chicago mailing list