[Chicago] Kickstarter Fund to get rid of the GIL
Tal Liron
tal.liron at threecrickets.com
Mon Jul 25 02:38:13 CEST 2011
JVM 7 will have some neat features, but they haven't been stabilized
yet, and at this point it's mostly experimentation. Fact is, even though
JVM 6 has been out for a few years already, many deployments still stick
to JVM 5. It does the job, and "upgrades" have their costs, money and
otherwise. I choose JVM for my project not because of speed, but because
of the maturity of the platform, which includes administration tools,
monitoring, security, and several best-in-class 3rd party libraries.
It's nice to know that performance is very high up there if I really
need it (at which case I just "drop down" to Java, rather than use a
dynamic JVM language).
The whole Jython codebase could use some help... it's even messier than
CPython's, if that's possible. There's a lot of room for optimization,
even before igniting JVM 7 shortcuts, though it will surely be at the
cost of regressions and stability.Luckily, there's a decent test suite,
which makes it easy to experiment for. The Jython community would LOVE
help, and it doesn't have to be just in terms of coding. Their recent
big project was to move the whole codebase from Subversion to Mercurial.
Another big item on the todo list is to get up to date with Python 3.
(Jython = Python 2.5 formally, though it has quite a few 2.6 additions.)
Jython also has some nice collaboration with JRuby, including people who
work on both projects. But, what I would make me happier is if there was
real code sharing, allowing for a dynamic core that would work well for
both projects.
Anyway. I guess I'm always confused by what people mean by "faster."
What are you trying to code for, exactly? Where is your bottleneck? What
is your funding? It's more likely that (although not necessarily) what
you really are looking for is "scalability," for which shear
computational performance is likely not the real issue. If money is
coming, getting more expensive, faster machines may do the trick better
than any JVM 7 optimization.
If you just want a command line tool that starts fast, JVM is *not*
where you want to go. It has notoriously slow startup, for exactly those
mechanisms that make it perform so well as it runs.
Another way to look at "faster" is as a way to save money. Weird, huh?
But consider Facebook's HipHop project. (Sorry that all of my examples
are from the web arena; it's where I mostly work these days.) The issue
was not that PHP was "slow," it was that when you have 1,000 machines
running at 90% CPU, a faster PHP runtime means that you can use 800
machines, instead, for the same workload. A few orders of magnitude
forward, and savings can be enormous.
If you have a project with 1,000 machines running at 90% CPU, please
hire me! It may be very worthwhile for you to create a more performant
Python runtime (JVM-based or not), and I'd love to be paid to do that.
:) And it would also make a lot of irrational Python speed freaks happy.
-Tal
On 07/24/2011 06:18 PM, John Stoner wrote:
> Jython's not bad. I've used it a lot, and it plays well with lots of
> Java APIs. Pretty slick, actually. I hear Java 1.7 has some new
> dynamic features at the JVM level. I always imagined Jython would run
> a lot faster if it took advantage of them. Tal, do you know if there's
> any work on that? Googling around a bit I'm not seeing much.
>
> On Sun, Jul 24, 2011 at 4:32 PM, Joshua Herman
> <zitterbewegung at gmail.com <mailto:zitterbewegung at gmail.com>> wrote:
>
> At least erlang works for the use cases. I wasn't aware that Jython
> was that powerful I will have to play with it.
>
> On Sun, Jul 24, 2011 at 3:46 PM, Tal Liron
> <tal.liron at threecrickets.com <mailto:tal.liron at threecrickets.com>>
> wrote:
> > There is an alternative: Jython, which is Python on the JVM, and
> has no GIL.
> > It's real, it works, and has a very open community. If you want
> to do
> > high-concurrency in Python, it's the way to go. (And it has
> other advantages
> > and disadvantages, of course.)
> >
> >
> > I am always a bit frightened by community attempts to create new
> virtual
> > machines for favorite languages in order to solve problem X.
> This shows a
> > huge under-estimation of what it means to create a robust, reliable,
> > performative generic platform. Consider how many really reliable
> versions of
> > the C standard library out there -- and how many decades they
> took to
> > mature, even with thousands of expert eyes poring over the code
> and testing
> > it. And this is without duck typing (or ANY typing), data
> integrity, scoping
> > (+call/cc), tail recursion, or any other of the other huge (and
> exciting)
> > challenges required to run a dynamic language like Python.
> >
> >
> > So, it's almost amusing to see projects like Rubinius or Parrot
> come to be.
> > Really? This is the best use of our time and effort? I'm equally
> impressed
> > by the ballsiness of Erlang to create a new virtual machine from
> scratch.
> >
> >
> > But those are rather unique histories. CPython has it's own
> unique history.
> > Not many people realize this, but Python is about 6 years older
> than Java,
> > and the JVM would take another decade before reaching
> prominence. JavaScript
> > engines (running in web browsers only) at the time were
> terrible, and Perl
> > was entirely interpreted (no VM). So, in fact, CPython was
> written where
> > there was no really good platform for dynamic languages. It
> wasn't a matter
> > of hubris ("not invented here") to build a VM from scratch;
> there was simply
> > no choice.
> >
> >
> > Right now, though, there are many good choices. People like Rich
> Hickey
> > (Clojure) and Martin Odersky (Scala) have it right in targeting
> the JVM,
> > although both projects are also exploring .NET/Mono. If Python
> were invented
> > today, I imagine it also would start with "Jython," instead of
> trying to
> > reinvent the wheel (well, reinvent a whole damn car fleet,
> really, in terms
> > of the work required).
> >
> >
> > One caveat: I think there is room for "meta-VM" projects like
> PyPy and LLVM.
> > These signify a real progress in architecture, whereas "yet
> another dynamic
> > VM" does not.
> >
> >
> > -Tal
> >
> >
> > On 07/24/2011 02:56 PM, Jason Rexilius wrote:
> >
> >> I also have to quote:
> >>
> >> "rather that, for problems for which shared-memory concurrency is
> >> appropriate (read: the valid cases to complain about the GIL),
> message
> >> passing will not be, because of the marshal/unmarshal overhead
> (plus data
> >> size/locality ones)."
> >>
> >>
> >> I have to say this is some of the best discussion in quite a
> while. Dave's
> >> passionate response is great as well as others. I think the
> rudeness, or
> >> not, is kinda besides the point.
> >>
> >> There is a valid point to be made about marshal/unmarshal
> overhead in
> >> situations where data-manipulation-concurrency AND _user
> expectation_ or
> >> environmental constraints apply. I think that's why people
> have some
> >> grounds to be unhappy with the GIL concept (for me its a
> concept) in certain
> >> circumstances. Tal is dead on in that "scalability" means
> different things.
> >>
> >> Oddly, I'm more engaged in this as an abstract comp sci
> question than a
> >> specific python question. The problem set applies across
> languages.
> >>
> >> The question I would raise is if, given that an engineer
> understands the
> >> problem he is facing, are there both tools in the toolbox? Is
> there an
> >> alternative to GIL for the use-cases where it is not the ideal
> solution?
> >>
> >> BTW, I will stand up for IPC as one of the tools in the toolbox
> to deal
> >> with scale/volume/speed/concurrency problems.
> >>
> >>
> >> On 7/24/11 1:58 PM, Tal Liron wrote:
> >>>
> >>> I would say that there's truth in both approaches.
> "Scalability" means
> >>> different things at different levels of scale. A web example: the
> >>> architecture of Twitter or Facebook is nothing like the
> architecture of
> >>> even a large Django site. It's not even the same problem field.
> >>>
> >>>
> >>> A good threading model can be extremely efficient at certain
> scales. For
> >>> data structures that are mostly read, not written,
> synchronization is
> >>> not a performance issue, and you get the best throughput
> possible in
> >>> multicore situations. The truly best scalability would be
> achieved by a
> >>> combined approach: threading on a single node, message passing
> between
> >>> nodes. Programming for that, though, is a nightmare (unless
> you had a
> >>> programming language that makes both approaches transparent)
> and so
> >>> usually at the large scale the latter approach is chosen. One
> >>> significant challenge is to make sure that operations that
> MIGHT use the
> >>> same data structures are actually performed on the same node,
> so that
> >>> threading would be put to use.
> >>>
> >>>
> >>> So, what Dave said applies very well to threading, too: "you
> still need
> >>> to know what you're doing and how to decompose your
> application to use
> >>> it."
> >>>
> >>>
> >>> Doing concurrency right is hard. Doing message passing right
> is hard.
> >>> Functional (persistent data structure) languages are hard,
> too. Good
> >>> thing we're all such awesome geniuses, bursting with
> experience and a
> >>> desire to learn.
> >>>
> >>>
> >>> -Tal
> >>>
> >>>
> >>> On 07/23/2011 01:40 PM, David Beazley wrote:
> >>>
> >>>>> "high performance just create multi processes that message" very
> >>>>> rarely have
> >>>>> I heard IPC and high performance in the same sentence.
> >>>>>
> >>>>> Alex
> >>>>>
> >>>> Your youth and inexperience is the only reason would make a
> statement
> >>>> that ignorant. Go hang out with some people doing Python and
> >>>> supercomputing for awhile and report back---you will find
> that almost
> >>>> significant application is based on message passing (e.g.,
> MPI). This
> >>>> is because message passing has proven itself to be about the
> only sane
> >>>> way of scaling applications up to run across thousands to tens of
> >>>> thousands of CPU cores.
> >>>>
> >>>> I speak from some experience as I was writing such software
> for large
> >>>> Crays, Connection Machines, and other systems when I first
> discovered
> >>>> Python back in 1996. As early as 1995, our group had done
> performance
> >>>> experiments comparing threads vs. message passing on some
> >>>> multiprocessor SMP systems and found that threads just didn't
> scale or
> >>>> perform as well as message passing even on machines with as
> few as 4
> >>>> CPUs. This was all highly optimized C code for numerics (i.e., no
> >>>> Python or GIL).
> >>>>
> >>>> That said, in order to code with message passing, you still
> need to
> >>>> know what you're doing and how to decompose your application
> to use it.
> >>>>
> >>>> Cheers,
> >>>> Dave
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Chicago mailing list
> >>>> Chicago at python.org <mailto:Chicago at python.org>
> >>>> http://mail.python.org/mailman/listinfo/chicago
> >>>
> >>> _______________________________________________
> >>> Chicago mailing list
> >>> Chicago at python.org <mailto:Chicago at python.org>
> >>> http://mail.python.org/mailman/listinfo/chicago
> >>
> >> _______________________________________________
> >> Chicago mailing list
> >> Chicago at python.org <mailto:Chicago at python.org>
> >> http://mail.python.org/mailman/listinfo/chicago
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org <mailto:Chicago at python.org>
> > http://mail.python.org/mailman/listinfo/chicago
> >
> _______________________________________________
> Chicago mailing list
> Chicago at python.org <mailto:Chicago at python.org>
> http://mail.python.org/mailman/listinfo/chicago
>
>
>
>
> --
> blogs:
> http://johnstoner.wordpress.com/
> 'In knowledge is power; in wisdom, humility.'
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
More information about the Chicago
mailing list