[pypy-dev] Question on the future of RPython

Tue Sep 28 02:03:08 CEST 2010

On Tue, 2010-09-28 at 00:52 +0200, Paolo Giarrusso wrote:
> On Mon, Sep 27, 2010 at 21:58, Leonardo Santagada <santagada at gmail.com> wrote:
> > On Mon, Sep 27, 2010 at 4:44 PM, Terrence Cole
> > <list-sink at trainedmonkeystudios.org> wrote:
> >> On Sun, 2010-09-26 at 23:57 -0700, Saravanan Shanmugham wrote:
> >>> Well, I am happy to see that the my interest in a general purpose RPython is not
> >>> as isolated as I was lead to believe :-))
> >>> Thx,
> >>
> >> What I wrote has apparently been widely misunderstood, so let me explain
> >> what I mean in more detail.  What I want is _not_ RPython and it is
> >> _not_ Shedskin.  What I want is not a compiler at all.  What I want is a
> >> visual tool, for example, a plugin to an IDE.  This tool would perform
> >> static analysis on a piece of python code.  Instead of generating code
> >> with this information, it would mark up the python code in the text
> >> display with colors, weights, etc in order to show properties from the
> >> static analysis.  This would be something like semantic highlighting, as
> >> opposed to syntax highlighting.
> >>
> >> I think it possible that this information would, if created and
> >> presented in the correct way, represent the sort of optimizations that
> >> pypy-c-jit -- a full python implementation, not a language subset --
> >> would likely perform on the code if run.  Given this sort of feedback,
> >> it would be much easier for a python coder to write code that works well
> >> with the jit: for example, moving a declaration inside a loop to avoid
> >> boxing, based on the information presented.
> >>
> >> Ideally, such a tool would perform instantaneous syntax highlighting
> >> while editing and do full parsing and analysis in the background to
> >> update the semantic highlighting as frequently as possible.  Obviously,
> >> detailed static analysis will provide far more information than it would
> >> be possible to display on the code at once, so I see this gui as having
> >> several modes -- like predator vision -- that show different information
> >> from the analysis.  Naturally, what those modes are will depend strongly
> >> on the details of how pypy-c-jit works internally, what sort of
> >> information can be sanely collected through static analysis, and,
> >> naturally, user testing.
> >>
> >> I was somewhat baffled at first as to how what I wrote before was
> >> interpreted as interest in a static python.  I think the disconnect here
> >> is the assumption on many people's part that a static language will
> >> always be faster than a dynamic one.  Given the existing tools that
> >> provide basically no feedback from the compiler / interpreter / jitter,
> >> this is inevitably true at the moment.  I foresee a future, however,
> >> where better tools let us use the full power of a dynamic python AND let
> >> us tighten up our code for speed to get the full advantages of jit
> >> compilation as well.  I believe that in the end, this combination will
> >> prove superior to any fully static compiler.
> >
> > This all looks interesting, and if you can plug that on emacs or
> > textmate I would be really happy, but it is not what I want. I would
> > settle for a tool that generates at runtime information about what the
> > jit is doing in a simple text format (json, yaml or something even
> > simpler?) and a tool to visualize this so you can optimize python
> > programs to run on pypy easily. The biggest difference is that just
> > collecting this info from the JIT appears to be much much easier than
> > somehow implement a static processor for python code that do some form
> > of analysis.
> 
> Have you looked at what the Azul Java VM supports for Java, in
> particular RTPM (Real Time Performance Monitoring)?

Briefly, but it's not open source, and it's a Java thing, so it didn't
pique my interest significantly.

> Academic accounts are available, and from Cliff Click's presentations,
> it seems to be a production-quality solution for this (for Java),
> which could give interesting ideas. Azul business is exclusively
> centered around Java optimization at the JVM level, so while
> not-so-famous they are quite relevant.
> 
> See slide 28 of: www.azulsystems.com/events/vee_2009/2009_VEE.pdf for
> some more details.
> See also wiki.jvmlangsummit.com/pdf/36_Click_fastbcs.pdf, and the
> account about JRuby's slowness (caused by unreliable performance
> analysis tools).
> 
> Given that JIT can beat static compilation only through forms of
> profile-directed optimization, I also believe that the interesting
> information should be obtained through logs from the JIT. A static
> analyser can't do something better than a static compiler - not
> reliably at least.

I'd be pursuing the jit logging approach much more aggressively if I
cared at all about Python2 anymore.  All of the source I care about
analyzing is in Python3.  However, considering the rate I'm going, pypy
will doubtless support python3 by the time I get a half-way descent
static analyzer working anyway, so it's probably worth considering.

> _However_, static semantic highlighting might still be interesting:
> while it does not help understanding profile-directed optimizations
> done by the JIT, it might help understanding the consequences of the
> execution model of the language itself, where it has a weird impact on
> performance.
> E.g., for CPython, it might be very useful simply highlighting usages
> of global variables, that require a dict lookup, as "bad", especially
> in tight loops. OTOH, that kind of optimization should be done by a
> JIT like PyPy, not by the programmer.
> I believe that CALL_LIKELY_BUILTIN and hidden classes already allow
> PyPy to fix the problem without changing the source code.
> 
> The question then is: which kinds of constructs are unexpectedly slow
> in Python, even with a good JIT?

Precisely.  I'd love a good answer to that question.

In addition to jitting, although it would not technically be python
anymore, I see a place for something like SPUR or Jaegermonkey --
combined compilation and jitting.  Naturally, the performance of such a
beast over a jit alone would be dependent on how much boxing the
compiler could remove.  My goal for this work is about half geared
towards answering that single question, just so I'll know if I should
stop dreaming about python eventually having performance parity with C/C
++.

I tend to think that having a solid (if never perfect) static analyzer
for python could help in many areas.   I had thought that helping coders
help the jit out would be a good first use, but as you say, there will
be problems with that.  Regardless, my hope is that a library for static
analysis of python will be more generally useful than my own
hare-brained schemes.

In any case, I'm working on this in the form of a code editor first
because, regardless of what the answer to the previous question is, I
know from experience that highlighting for python like what
SourceInsight does for C++ will be extremely useful. 

Thank you for the kind feedback, your comments are much appreciated.
-Terrence

> Best regards