How to turn a crawling caterpillar of a VM into a graceful butterfly

Hi all, As many of you know, over the past few months I've been creating an RPython VM for the Converge language <http://convergepl.org>. This is now mostly complete at a basic level - enough to run pretty much all of my Converge programs at least on Linux and OpenBSD. [Major remaining issues are: no floating point numbers; 32 bit support is not finished; may not compile on OS X.] First, the good news. The RPython VM (~3 months effort) is currently close to 3 times faster than the old C-based VM (~18 months effort). I think the Converge VM is the first medium-scale VM to be created by someone outside the core PyPy group, so those numbers are a testament to the power of the RPython approach. I'd like to thank (alphabetically) Carl Friedrich Bolz, Maciej Fijalkowski, and Armin Rigo who've offered help, encouragement and (in Armin's case) big changes to RPython to help the Converge VM. I wouldn't have got this far without their help, or of others on the PyPy IRC channel. Now, the bad news: the RPython VM should be a lot faster than it currently is as the old VM is (to say the least) not very good. In large part this is because I simply don't know how best to optimise an RPython VM, particularly the JIT (in fact, at the moment the JITted VM seems to be sometimes slower than the non-JIT VM: I don't have an explanation for why). I'm therefore soliciting advice / code from those more knowledgeable than myself on how to optimise an RPython VM. Note that I'm intentionally being more general than the Converge VM: I hope that some of the ideas generated will prove useful to the many VMs that I hope will come to be implemented in RPython. Some of my ignorance is simply that many parts of RPython have little or no documentation: by playing the part of a clueless outsider (it comes naturally to me!), I hope I may help pinpoint areas where further documentation is most needed. If you want to test out the VM it's here: https://github.com/ltratt/converge Building should mostly be: $ export PYPY_SRC=<location of PyPy source directory> $ cd converge $ ./configure $ make That will build a JITted VM. If you want a lower level of optimisation, specify it to configure e.g. "./configure --opt=3". Please note that, for the time being, on Linux the JIT won't work with the default GC root finder, so you'll need to manually specify --gcrootfinder=shadowstack in vm/Makefile. Apart from that, building on 64 bit Unix should hopefully be reasonably simple. A simple performance benchmark is along the lines of "make clean ; cd vm ; make ; cd .. ; time make regress" which builds the compiler, various examples, and the handful of tests that Converge comes with. [Please note, Converge wasn't built with a TDD philosophy; while I welcome contributions of tests, I am unable to make any major efforts in that regard myself in the short-term. I know this sits uneasily with many in the PyPy community, but I hope you are able to overlook this difference in development philosophy.] Here are some examples of questions I'd love to know answers to: * Why is the JITted VM that's built sometimes 2x slower than --opt=3, but other times a few percent faster *on the same benchmark*?! * What are virtualrefs? * What is a more precise semantics of elidable? * What is 'specialize'? * Is it worth manually inlining functions in the main VM loop? * Can I avoid the (many) calls to rffi.charp[size]2str, given that they're mostly taken from the never-free'd mod_bc? Would this work well on some GCs but not others? No doubt there are many other things I would do well to know, but plainly do not - please educate me! If you have any questions or comments about Converge or the VM, please don't hesitate to ask me - and thank you in advance for your help. Yours, Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sat, Dec 31, 2011 at 12:03 PM, Laurence Tratt <laurie@tratt.net> wrote:
Hi Laurence. Overall great work, but I have to point out one thing - if you want us to look into benchmarks, you have to provide a precise benchmark and a precise way to run it (few examples would be awesome), otherwise for people who are not aware of the language this might pose a challenge. PS. Glad to be of any help Cheers, fijal

On Sat, Dec 31, 2011 at 12:29:40PM +0200, Maciej Fijalkowski wrote: Hi Maciej,
At the moment, I'm not even sure I know what representative benchmarks might be - it's certainly something I'd like advice on! I did mention one simple benchmark in my message, which is "make regress" - it compiles the compiler, standard library, examples, and tests. This exercises most (not all, but most) of the infrastructure, so it's a pretty decent test (though it may not be entirely JIT friendly, as it's lots of small-ish tests; whether the JIT will warm up sufficiently is an open question in my mind). To run this test, I'd suggest something like: $ make clean ; cd vm ; make ; cd .. ; time make regress The reason for that order is that it 1) cleans everything (in order that it'll be built later) 2) compiles the VM on its own (we don't want to include that in the timings!) 3) executes "make regress" and times it (without including the VM build in the figures). If you're interested in creating new micro-benchmarks, the language manual is hopefully instructive: http://convergepl.org/documentation/1.2/quick_intro/ Let's say, for arguments sake, that you thought testing integer addition in a loop is a good idea. This program will do the trick: func main(): i := 0 while i < 100000: i += 1 You can then compile it: $ convergec -m f.cv and then execute it: $ time converge f That way, you know you're not including the time needed to compile the program. Yours, Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

Hi Laurence, On Sat, Dec 31, 2011 at 16:26, Laurence Tratt <laurie@tratt.net> wrote:
A quick update: on this program, with 100 times the number of iterations, "converge-opt3" runs in 2.6 seconds on my laptop, and "converge-jit" runs in less than 0.7 seconds. That's already a 4x speed-up :-) I think that you simply underestimated the warm-up times. A bientôt, Armin.

On Sat, Dec 31, 2011 at 05:45:35PM +0100, Armin Rigo wrote: Hi Armin,
In fairness, I did know that that program benefits from the JIT at least somewhat :) I was wondering if there are other micro-benchmarks that the PyPy folk found paricularly illuminating / surprising when optimising PyPy. There's also something else that's weird. Try "time make regress" with --opt=3 and --opt=jit. The latter is often twice as slow as the former. I have no useful intuitions as why at the moment. Yours, Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sat, Dec 31, 2011 at 6:58 PM, Laurence Tratt <laurie@tratt.net> wrote:
The most coarse-grained test would be PYPYLOG=jit-summary:- <whatever commands you want to run>, that should provide you some feedback on tracing and other warmup-related times as well as some basic stats. Note that pypy-jit is almost never slowe than pypy-no-jit, but it's certainly slower than CPython for running tests (most of the time). Tests are the particular case of jit-unfriendly code, because ideally they execute each piece of code once. Cheers, fijal

On Sun, Jan 01, 2012 at 03:52:47PM +0200, Maciej Fijalkowski wrote:
I haven't noticed this particular output type before... but I'm not really sure what most of the numbers actually mean! Does anyone have any pointers? Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sun, Jan 01, 2012 at 06:47:08PM +0200, Maciej Fijalkowski wrote:
Thanks! Here's an example (hopefully a decent one!): http://pastebin.com/51QpPD7C Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sun, Jan 1, 2012 at 6:57 PM, Laurence Tratt <laurie@tratt.net> wrote:
so for example here, tracing has taken 0.8s out of 2.5 total (a lot) + 0.2s for the backend. Also, there were 16 loops aborted because trace run out for too long. That might mean a lot of things. If you can post some example I can probably look at traces. For example - is the loop iteration from above a good one? Cheers, fijal

On Sun, Jan 01, 2012 at 07:02:14PM +0200, Maciej Fijalkowski wrote:
I don't know if it's a good one or not :) This is a real program executing under the Converge JIT - the Converge compiler. If you have a checkout of the Converge VM and have built the VM with a JIT, you can recreate this by doing: $ cd compiler $ PYPYLOG=jit-summary:stats ../vm/converge convergec -o Compiler/Code_Gen.cvb Compiler/Code_Gen.cv
Also, there were 16 loops aborted because trace run out for too long. That might mean a lot of things.
I'm going to assume this is because (at least as I've done things so far) the VM is effectively inlining all calls to RPython-level functions (in other words, if the bytecode calls a builtin function, the latter is inlined). It's not clear to me whether this is entirely desireable - sometimes it might be sensible, but often not. How does PyPy handle this? Does it have a blanket "don't look inside builtin functions" for example? Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sun, Jan 1, 2012 at 11:13 AM, Laurence Tratt <laurie@tratt.net> wrote:
Take a look at pypy/module/pypyjit/policy.py it shows the JITPolicy for which functions can be inlined. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

On Sat, Dec 31, 2011 at 12:03 PM, Laurence Tratt <laurie@tratt.net> wrote:
Hi Laurence. Overall great work, but I have to point out one thing - if you want us to look into benchmarks, you have to provide a precise benchmark and a precise way to run it (few examples would be awesome), otherwise for people who are not aware of the language this might pose a challenge. PS. Glad to be of any help Cheers, fijal

On Sat, Dec 31, 2011 at 12:29:40PM +0200, Maciej Fijalkowski wrote: Hi Maciej,
At the moment, I'm not even sure I know what representative benchmarks might be - it's certainly something I'd like advice on! I did mention one simple benchmark in my message, which is "make regress" - it compiles the compiler, standard library, examples, and tests. This exercises most (not all, but most) of the infrastructure, so it's a pretty decent test (though it may not be entirely JIT friendly, as it's lots of small-ish tests; whether the JIT will warm up sufficiently is an open question in my mind). To run this test, I'd suggest something like: $ make clean ; cd vm ; make ; cd .. ; time make regress The reason for that order is that it 1) cleans everything (in order that it'll be built later) 2) compiles the VM on its own (we don't want to include that in the timings!) 3) executes "make regress" and times it (without including the VM build in the figures). If you're interested in creating new micro-benchmarks, the language manual is hopefully instructive: http://convergepl.org/documentation/1.2/quick_intro/ Let's say, for arguments sake, that you thought testing integer addition in a loop is a good idea. This program will do the trick: func main(): i := 0 while i < 100000: i += 1 You can then compile it: $ convergec -m f.cv and then execute it: $ time converge f That way, you know you're not including the time needed to compile the program. Yours, Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

Hi Laurence, On Sat, Dec 31, 2011 at 16:26, Laurence Tratt <laurie@tratt.net> wrote:
A quick update: on this program, with 100 times the number of iterations, "converge-opt3" runs in 2.6 seconds on my laptop, and "converge-jit" runs in less than 0.7 seconds. That's already a 4x speed-up :-) I think that you simply underestimated the warm-up times. A bientôt, Armin.

On Sat, Dec 31, 2011 at 05:45:35PM +0100, Armin Rigo wrote: Hi Armin,
In fairness, I did know that that program benefits from the JIT at least somewhat :) I was wondering if there are other micro-benchmarks that the PyPy folk found paricularly illuminating / surprising when optimising PyPy. There's also something else that's weird. Try "time make regress" with --opt=3 and --opt=jit. The latter is often twice as slow as the former. I have no useful intuitions as why at the moment. Yours, Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sat, Dec 31, 2011 at 6:58 PM, Laurence Tratt <laurie@tratt.net> wrote:
The most coarse-grained test would be PYPYLOG=jit-summary:- <whatever commands you want to run>, that should provide you some feedback on tracing and other warmup-related times as well as some basic stats. Note that pypy-jit is almost never slowe than pypy-no-jit, but it's certainly slower than CPython for running tests (most of the time). Tests are the particular case of jit-unfriendly code, because ideally they execute each piece of code once. Cheers, fijal

On Sun, Jan 01, 2012 at 03:52:47PM +0200, Maciej Fijalkowski wrote:
I haven't noticed this particular output type before... but I'm not really sure what most of the numbers actually mean! Does anyone have any pointers? Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sun, Jan 01, 2012 at 06:47:08PM +0200, Maciej Fijalkowski wrote:
Thanks! Here's an example (hopefully a decent one!): http://pastebin.com/51QpPD7C Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sun, Jan 1, 2012 at 6:57 PM, Laurence Tratt <laurie@tratt.net> wrote:
so for example here, tracing has taken 0.8s out of 2.5 total (a lot) + 0.2s for the backend. Also, there were 16 loops aborted because trace run out for too long. That might mean a lot of things. If you can post some example I can probably look at traces. For example - is the loop iteration from above a good one? Cheers, fijal

On Sun, Jan 01, 2012 at 07:02:14PM +0200, Maciej Fijalkowski wrote:
I don't know if it's a good one or not :) This is a real program executing under the Converge JIT - the Converge compiler. If you have a checkout of the Converge VM and have built the VM with a JIT, you can recreate this by doing: $ cd compiler $ PYPYLOG=jit-summary:stats ../vm/converge convergec -o Compiler/Code_Gen.cvb Compiler/Code_Gen.cv
Also, there were 16 loops aborted because trace run out for too long. That might mean a lot of things.
I'm going to assume this is because (at least as I've done things so far) the VM is effectively inlining all calls to RPython-level functions (in other words, if the bytecode calls a builtin function, the latter is inlined). It's not clear to me whether this is entirely desireable - sometimes it might be sensible, but often not. How does PyPy handle this? Does it have a blanket "don't look inside builtin functions" for example? Laurie -- Personal http://tratt.net/laurie/ The Converge programming language http://convergepl.org/ https://github.com/ltratt http://twitter.com/laurencetratt

On Sun, Jan 1, 2012 at 11:13 AM, Laurence Tratt <laurie@tratt.net> wrote:
Take a look at pypy/module/pypyjit/policy.py it shows the JITPolicy for which functions can be inlined. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
participants (4)
-
Alex Gaynor
-
Armin Rigo
-
Laurence Tratt
-
Maciej Fijalkowski