The first JITing pypy-c!
Hi all, Two days ago, we got our first JITing version of pypy-c! This was mostly thanks to Arre, who compiled a version even though we knew it would segfault due to missing support for C structures with non-int-sized fields. Writing to a 1-byte-sized Bool field would overwrite the next 3 bytes of memory with zeroes... Nevertheless, the result managed to successfully run one function (and indeed segfault on many other functions). Our first JIT-run function! A recursive factorial. Today the field size problem is fixed. Playing around seems to show that it's harder to provoke a segfault now. The generated machine code is completely incredible, in size and complexity, but the following example runs. It could even be said to run faster with the JIT (8.2 seconds versus 11.4 seconds) but that's unfair, as all normal optimizations are turned off in this example (a regular pypy-c runs this example in 2.8 seconds). It still shows that our JIT already gives an improvement over completely-unoptimized C code, which is already some kind of success! def f(n): while n > 0: n -= 2 return n Many other examples give an UnboundLocalError, due probably to some minor bug somewhere either in the JIT transformation or in the back-end (along the lines of a == compiled as a !=). If you want to try for yourself: - check out or switch to the branch http://codespeak.net/svn/pypy/branch/jit-real-world - run "translate.py /path/to/pypy/jit/goal/targetjit.py" (that's the usual translate.py from translator/goal) - you get uncompiled C sources for now; copy it safely away from /tmp/usession-yourname/testing_1/ before it gets deleted, and compile it ("make" or "make debug"). - run "PYPYJITLOG=log ./testing_1" import pypyjit; pypyjit.enable(f.func_code) f(7000000) # see f above - the above PYPYJITLOG env var causes a file called 'log' to be produced, containing the generated assembler code. It can be viewed in a flowgraph-like fashion with pypy/jit/codegen/i386/viewcode.py. Don't ask me yet to describe the result, nor where the while loop is :-) If you are familiar with i386 assembler, you'll laugh at the obviously bad code, too. That's where your help would be appreciated! Making the backend produce more reasonable code, starting with some basic register allocation, is a mostly-independent project. The PPC backend, btw, has already got this kind of techniques (I wonder what speed-ups we get on PPC from a jitting pypy-c). Have fun, Armin
Armin Rigo <arigo@tunes.org> writes:
Hi all,
Two days ago, we got our first JITing version of pypy-c!
This was mostly thanks to Arre, who compiled a version even though we knew it would segfault due to missing support for C structures with non-int-sized fields. Writing to a 1-byte-sized Bool field would overwrite the next 3 bytes of memory with zeroes... Nevertheless, the result managed to successfully run one function (and indeed segfault on many other functions). Our first JIT-run function! A recursive factorial.
Hooray!
- the above PYPYJITLOG env var causes a file called 'log' to be produced, containing the generated assembler code. It can be viewed in a flowgraph-like fashion with pypy/jit/codegen/i386/viewcode.py. Don't ask me yet to describe the result, nor where the while loop is :-) If you are familiar with i386 assembler, you'll laugh at the obviously bad code, too. That's where your help would be appreciated! Making the backend produce more reasonable code, starting with some basic register allocation, is a mostly-independent project. The PPC backend, btw, has already got this kind of techniques (I wonder what speed-ups we get on PPC from a jitting pypy-c).
Well, I guess so far it just plain doesn't work, but also the PPC register allocation is pretty horrible, especially around function calls. We've talked about the backend change needed to make this sensible (giving the list of live values to the block-ending builder methods), I guess it just has to be done at some point... Cheers, mwh -- You can lead an idiot to knowledge but you cannot make him think. You can, however, rectally insert the information, printed on stone tablets, using a sharpened poker. -- Nicolai -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html
Wow - that's awesome! Congratulations to all involved. Cheers, Richard On Fri, 8 Dec 2006, Armin Rigo wrote:
Hi all,
Two days ago, we got our first JITing version of pypy-c!
This was mostly thanks to Arre, who compiled a version even though we knew it would segfault due to missing support for C structures with non-int-sized fields. Writing to a 1-byte-sized Bool field would overwrite the next 3 bytes of memory with zeroes... Nevertheless, the result managed to successfully run one function (and indeed segfault on many other functions). Our first JIT-run function! A recursive factorial.
Today the field size problem is fixed. Playing around seems to show that it's harder to provoke a segfault now. The generated machine code is completely incredible, in size and complexity, but the following example runs. It could even be said to run faster with the JIT (8.2 seconds versus 11.4 seconds) but that's unfair, as all normal optimizations are turned off in this example (a regular pypy-c runs this example in 2.8 seconds). It still shows that our JIT already gives an improvement over completely-unoptimized C code, which is already some kind of success!
def f(n): while n > 0: n -= 2 return n
Many other examples give an UnboundLocalError, due probably to some minor bug somewhere either in the JIT transformation or in the back-end (along the lines of a == compiled as a !=).
If you want to try for yourself:
- check out or switch to the branch http://codespeak.net/svn/pypy/branch/jit-real-world
- run "translate.py /path/to/pypy/jit/goal/targetjit.py" (that's the usual translate.py from translator/goal)
- you get uncompiled C sources for now; copy it safely away from /tmp/usession-yourname/testing_1/ before it gets deleted, and compile it ("make" or "make debug").
- run "PYPYJITLOG=log ./testing_1"
import pypyjit; pypyjit.enable(f.func_code) f(7000000) # see f above
- the above PYPYJITLOG env var causes a file called 'log' to be produced, containing the generated assembler code. It can be viewed in a flowgraph-like fashion with pypy/jit/codegen/i386/viewcode.py. Don't ask me yet to describe the result, nor where the while loop is :-) If you are familiar with i386 assembler, you'll laugh at the obviously bad code, too. That's where your help would be appreciated! Making the backend produce more reasonable code, starting with some basic register allocation, is a mostly-independent project. The PPC backend, btw, has already got this kind of techniques (I wonder what speed-ups we get on PPC from a jitting pypy-c).
Have fun,
Armin _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re-hi all, On Fri, Dec 08, 2006 at 03:19:50AM +0100, Armin Rigo wrote:
Many other examples give an UnboundLocalError
Fixed now. Jitting execution seems to work as well as the normal one. We tried various things, including stuff not supported by Psyco (generators, nested scopes...), with success. Now it's only a matter to make it produce useful code, as opposed to just slow, big and incredible code...
If you want to try for yourself:
- check out or switch to the branch http://codespeak.net/svn/pypy/branch/jit-real-world
The branch is now merged into the trunk! The rest of the instructions still apply. (The merging was a bit of fun, as usual; if someone thinks that he might have lost any changes, don't hesitate to blame us before checking in detail :-) A bientot, Armin.
Hi, On Sat, 9 Dec 2006, Armin Rigo wrote:
Re-hi all,
On Fri, Dec 08, 2006 at 03:19:50AM +0100, Armin Rigo wrote:
Fixed now. Jitting execution seems to work as well as the normal one. We tried various things, including stuff not supported by Psyco (generators, nested scopes...), with success.
Very cool! But do recursive intepreters work now? :-) Cheers, Richard
Hi Richard, On Thu, Dec 14, 2006 at 01:22:15AM +0000, Richard Emslie wrote:
Fixed now. Jitting execution seems to work as well as the normal one. We tried various things, including stuff not supported by Psyco (generators, nested scopes...), with success.
Very cool! But do recursive intepreters work now? :-)
Yes, everything works :-) Except the bytecode trace hook. The machine code is really terrible at all points of view, but you can JIT whatever piece of Python code you like - generators, nested scopes, class: statement bodies, all these cases where Psyco give up. That's the point of the approach, really :-) We have a minor detail to solve before it can be tested on larger examples, though - it exhausts the 32-bit address space far too early (without actually consuming much of the reserved pages) and then we get a MemoryError. It's a back-end problem; Arre started working on that today. A bientot, Armin.
Hi Armin! Wow! So at risk of coming across really stupid, I am going to hazard a guess at what is going on. :-) As timeshifting is a translation time thing, we can handle mutiple or recursive merge points as a one off atomic operation before we run. So now when we run we have two modes of execution - 'compile mode' and 'normal mode'. (i hope at least this part is right!) So now we are running our timeshifted translated code, where the original rpython code defined a recursive interpreter, which in turn defined a recursive function with a mergepoint. When we hit a flexiswitch (the merge point?), we go from normal mode to compile mode and may promote / partially evaluate run time variables for each new instance of a run time variable (the promotion hinted ones). So what if a 'promotion' in compile mode triggers another promotion down the line (ie during the timeshifted residualizing code) and we go into a infinite loop of promoting? Ok - so more questions than statements - and a high probabilty that it made no sense to all - and maybe I should just wait for the docs! :-) Cheers, Richard On Thu, 14 Dec 2006, Armin Rigo wrote:
Hi Richard,
On Thu, Dec 14, 2006 at 01:22:15AM +0000, Richard Emslie wrote:
Fixed now. Jitting execution seems to work as well as the normal one. We tried various things, including stuff not supported by Psyco (generators, nested scopes...), with success.
Very cool! But do recursive intepreters work now? :-)
Yes, everything works :-) Except the bytecode trace hook. The machine code is really terrible at all points of view, but you can JIT whatever piece of Python code you like - generators, nested scopes, class: statement bodies, all these cases where Psyco give up. That's the point of the approach, really :-)
We have a minor detail to solve before it can be tested on larger examples, though - it exhausts the 32-bit address space far too early (without actually consuming much of the reserved pages) and then we get a MemoryError. It's a back-end problem; Arre started working on that today.
A bientot,
Armin.
participants (4)
-
Armin Rigo
-
Michael Hudson
-
Niko Matsakis
-
Richard Emslie