[pypy-dev] Question on the future of RPython

Tue Sep 28 21:49:08 CEST 2010

On Tue, 2010-09-28 at 11:55 +1000, William Leslie wrote:
> On 28 September 2010 10:43, Terrence Cole
> <list-sink at trainedmonkeystudios.org> wrote:
> > On Tue, 2010-09-28 at 01:57 +0200, Jacob Hallén wrote:
> >> Monday 27 September 2010 you wrote:
> >> > On Sun, 2010-09-26 at 23:57 -0700, Saravanan Shanmugham wrote:
> >> > > Well, I am happy to see that the my interest in a general purpose RPython
> >> > > is not as isolated as I was lead to believe :-))
> >> > > Thx,
> >> >
> >> > What I wrote has apparently been widely misunderstood, so let me explain
> >> > what I mean in more detail.  What I want is _not_ RPython and it is
> >> > _not_ Shedskin.  What I want is not a compiler at all.  What I want is a
> >> > visual tool, for example, a plugin to an IDE.  This tool would perform
> >> > static analysis on a piece of python code.  Instead of generating code
> >> > with this information, it would mark up the python code in the text
> >> > display with colors, weights, etc in order to show properties from the
> >> > static analysis.  This would be something like semantic highlighting, as
> >> > opposed to syntax highlighting.
> >> >
> >> > I think it possible that this information would, if created and
> >> > presented in the correct way, represent the sort of optimizations that
> >> > pypy-c-jit -- a full python implementation, not a language subset --
> >> > would likely perform on the code if run.  Given this sort of feedback,
> >> > it would be much easier for a python coder to write code that works well
> >> > with the jit: for example, moving a declaration inside a loop to avoid
> >> > boxing, based on the information presented.
> >> >
> >> > Ideally, such a tool would perform instantaneous syntax highlighting
> >> > while editing and do full parsing and analysis in the background to
> >> > update the semantic highlighting as frequently as possible.  Obviously,
> >> > detailed static analysis will provide far more information than it would
> >> > be possible to display on the code at once, so I see this gui as having
> >> > several modes -- like predator vision -- that show different information
> >> > from the analysis.  Naturally, what those modes are will depend strongly
> >> > on the details of how pypy-c-jit works internally, what sort of
> >> > information can be sanely collected through static analysis, and,
> >> > naturally, user testing.
> >> >
> >> > I was somewhat baffled at first as to how what I wrote before was
> >> > interpreted as interest in a static python.  I think the disconnect here
> >> > is the assumption on many people's part that a static language will
> >> > always be faster than a dynamic one.  Given the existing tools that
> >> > provide basically no feedback from the compiler / interpreter / jitter,
> >> > this is inevitably true at the moment.  I foresee a future, however,
> >> > where better tools let us use the full power of a dynamic python AND let
> >> > us tighten up our code for speed to get the full advantages of jit
> >> > compilation as well.  I believe that in the end, this combination will
> >> > prove superior to any fully static compiler.
> >>
> >> The JIT works because it has more information at runtime than what is
> >> available at compile time. If the information was available at compile time we
> >> could do the optimizations then and not have to invoke the extra complexity
> >> required by the JIT. Examples of  the extra information include things like
> >> knowing that introspection will not be used in the current evaluation of a
> >> loop, specific argument types will be used in calls and that some arguments
> >> will be known to be constant over part of the program execution.. Knowing
> >> these bits allows you to optimize away large chunks o f the code that
> >> otherwise would have been executed.
> >>
> >> Static analysis assumes that none of the above mentioned possibilities can
> >> actually take place. It is impossible to make such assumptions at compile time
> >> in a dynamic language. Therefore PyPy is a bad match for people wanting to
> >> staically compile subsets of Python. Applying the JIT to RPython code
> >
> > Yes, that idea is just dumb.  It's also not what I suggested at all.  I
> > can see now that what I said would be easy to misinterpret, but on
> > re-reading it, it clearly doesn't say what you think it does.
> 
> It does make /some/ sense, I think. From the perspective of the JIT,
> operating at interp-level, 

I think this is a disconnect.  Applying a jit to a non-interpretted
language -- Jacob here seems to think I was talking about a static,
compiled subset of python -- makes little sense.  Static analysis to
provide help to an interpreter does, as you say, make some sense, and
not to just me.  Brett Cannon applied static type analysis to the
CPython interpreter for his PHD thesis [1], looking for a speed boost by
removing some typing abstraction.  Unfortunately, it was not
spectacularly helpful for CPython.  I think for pypy-jit, however, it
has much greater potential because of the possibility of full unboxing.
Given past results however, it's not the first place I'd go looking for
speedups.  Others may have better ideas in this area than I do though.

> the app-level python program *is the
> biggest part of* the "stuff you don't know about until runtime". That
> is, you don't know the program source at translation time, and most of
> the information the JIT is supposed to find are app-level constructs
> (eg app-level loops).

This is one of the reasons that I had to pull together my own parsing
(largely borrowed from pypy, actually) and analysis infrastructure,
rather than just using pypy's off-the-shelf.  Even without pypy's neat
analysis code, the fact that it ditches character-level info when making
an ast means you can't apply highlighting with it without groping about
half-blindly in the source.

> Of course any such analysis will fall flat in certain cases, like
> eval(raw_input(...)). But you should still be able to gather enough
> information for most fairly hygenic code.

Given the choice between the status quo and an extremely slow eval, but
much faster python overall, I think most people would pick the second.

> What sort of analyses did you have in mind?

As this is a side project, for the moment I am focusing on simple stuff,
mostly things I need/want for work.  In the short term these include
Python3 linting (which is almost working) and static type analysis.  The
second will be particularly interesting because we have (at work)
annotated most of our interfaces with type data, so this will probably
net much more specific and helpful data than it would in many projects.
I am also, specifically, as I mentioned to Paolo yesterday, trying to
find out how much of our code could be fully unboxed, given that we have
extensive type contracts at our interfaces.  If the answer is "most of
it", then it may make sense for us to build something like Jaegermonkey
for python someday.

> >>  is not
> >> workable, because the JIT is optimized to remove bits of generated assembler
> >> code that never shows up in the compilation of RPython code.
> >>
> >> These are very basic first principle concepts, and it is a mystery to me why
> >> people can't work them out for themselves.
> >
> > You are quite right that static analysis will be able to do little to
> > help an optimal jit.  However, I doubt that in the near term pypy's jit
> > will cover all the dark corners of python equally well -- C has been
> > around for 38 years and its still got room for optimization.
> 
> There are some undesirable things about static analysis, but it can
> sure be useful from optimisation, security and reliability
> perspectives. 

Brendan Eich agrees [2].  This is heartening, because javascript has
much in common with python.

I agree too, for that matter, but that's probably a lot less
heartening :-).

> There's also code browsing, too; IDEs require a
> different (fuzzier) parser, 

Reason number two that I have to maintain a separate parser/analyzer.

> but the question of 'what types does this
> object probably have' makes more sense with a little dependent region
> analysis. Optimising when you can be fairly confident of the types
> involved could be useful. That doesn't really sound like pypy at that
> point, though.

Given that I want to work with Python3 anyway (and that I'd never be
able to beat pypy's performance before it supports Python3), I'm
focusing mostly on a tool to help make reliable and correct code.  

However, performance is always in the back of my mind these days.  It
seems from this thread that I won't be able to do much in that regard
with my current approach, unfortunately.  Maybe by the time I can focus
on it, pypy will support python3 and I can work on providing real-time
jit feedback.

-Terrence

[1] http://www.ocf.berkeley.edu/~bac/thesis.pdf
[2] http://brendaneich.com/2010/08/static-analysis-ftw/