[pypy-dev] PyPy + LLVM
Chris Lattner
sabre at nondot.org
Sun Mar 18 00:25:26 CET 2007
On Wed, 14 Mar 2007, Armin Rigo wrote:
> Hi Chris,
Hey armin, I'm cc'ing pypy-dev, because this is suitable for a broader
audience. I'm sorry for the delay in my reply, I have been travelling
recently.
> After reading your slides from yesterday's presentation, it occurred to
> me that you might have a wrong idea about what PyPy is. I'm
Sorry about that. The third section of my talk was only loosely based on
pypy. Some of it was "inspired" by pypy, some of it was "unification"
ideas that I have been thinking for a while, some is based on my own
experience with static analysis, and some is just wishful thinking. I'm
sorry if it came across as "here is what pypy is doing", that is now what
I intended.
> double-guessing here, so of course I may be completely off - please bear
> with me. What PyPy is definitely not, is a Python compiler. In order
> to get to a common ground of discussion, may I suggest the following
> references:
Right. If I had to do a layman's summary, I would say that pypy provides
infrastructure for building interpreters in [r]python. This
infrastructure makes it much easier than starting from scratch, e.g. by
providing reusable components for language runtimes (like GC's).
My talk (available here
http://llvm.org/pubs/2007-03-12-BossaLLVMIntro.html ) clearly doesn't
describe pypy as it exists today, but I think it describes a place that
pypy could get to if the community desired it (In other words, I think
the strengths of pypy and of its community both play well into this). The
members of the pypy project clearly have experience with type inference,
clearly understand dynamic language issues, and clearly understand that
the semantics of these languages differ widely, but there are also a lot
of commonality. If the pypy community isn't interested, I will approach
others, eventually falling back to doing it directly in LLVM if needed.
For the record, I have read Brett Cannon's thesis. I don't think his
results are applicable for a number of reasons. First, his work was in
the context of an interpreter that used type information for optimization.
The approach I'm describing uses type information to give substantially
more information to a static compiler. The result of this is that the
codegen would be dramatically more efficient, and without the overhead of
an interpreter loop, this is far more significant than in his evaluation.
Also, significantly more type information would be available to the type
propagator if you used more aggressive techniques than he did.
To me, the much more significant issue with python (and ruby, and others)
for integer operations is that they automatically promote to big integers
on demand. In practice, this means that the code generated by a static
compiler would be optimized for the small case, but then would fallback
gracefully in the large case. If you're familiar with X86 asm, I'd expect
code for a series of adds to look like this:
add EAX, EBX
jo recover_1
add EAX, 17
jo recover_2
add EAX, ECX
jo recover_3
The idea of this is that (in the common case where the app deals with
small integers) you have *exteremely* fast codegen with easily
predicatable branches. The recovery code could, for example, package up
the registers into real "integer" objects, and then callback into the
interpreter to handle the hard case (note that this is very similar to the
fast paths in the standard python interpreter, which eventually falls back
to calling PyNumber_Add).
Floating point code, if type inference is successful, does not need this
sort of recovery code (unlike integer code). Other languages (like
Javascript) don't have this issue, but they have others (e.g. there are no
integers :), only floats.
Of course I'm biased, but I think that this sort of code could easily and
naturally be generated and optimized by LLVM, if you chose to target it.
This would let LLVM do the range analysis required to eliminate the
branches (when possible), handle other low level optimization, codegen,
etc. You would get extremely high performance code and a very
retargettable compiler.
Has python even been executed with a sustained performance of two
instructions per add? :)
-Chris
--
http://nondot.org/sabre/
http://llvm.org/
More information about the Pypy-dev
mailing list