Re: [pypy-dev] Question on the future of RPython

On Thu, Sep 30, 2010 at 03:21, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
On Wed, 2010-09-29 at 23:50 +0200, Paolo Giarrusso wrote:
Agreed, but "watch out for failure" is done by guards, and one of the further advantages of type inference, on top of "type feedback", described there, is being able to remove some of those guards, because that might help in some inner loops. In some cases, that's still possible, by "watching out" not during code execution but during execution of infrequent events. In Java loading a class might invalidate some optimization assumptions (like "this class has no subclass", useful for inlining without guards), but that's checked by the classloading fast-path. Same things could apply to allow Python cross-module type-inferencing. I don't know exactly how modules at JIT compile-time and runtime can be different, but I guess that invalidation at module loading should catch that. And invalidate lots of compiled code, which is usually fine. The interaction of this with tracing is actually interesting: in a Python tracing JIT, you could keep the traces and restore omitted guards, but when your JIT traces the Python interpreter, I wonder how do you express any of this. One can insert a guard only if needed, but telling "hey, this is invalid" requires some special API. My proposal, here, would be a "virtual guard", which is recorded in the guard but omitted from the output. The "omission" is what can be invalidated, but the trace itself (not its compiled version) is kept (because it can still be executed). If some form of this makes actually sense (I've just thought on it 5 minutes), this is something worth publishing, which would allow me to take some time of my PhD to work on it - if it's not already done by papers on tracing JITs. -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Hi Paolo, On Thu, Sep 30, 2010 at 8:33 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
My proposal, here, would be a "virtual guard", (...)
Yes, this proposal makes sense. It's an optimization that is definitely done in regular JITs, and we have a "to-do" task about it in http://codespeak.net/svn/pypy/extradoc/planning/jit.txt (where they are called "out-of-line guards"). Armin.

Hi, on the topic of optimizations, since I should get these out of my brain, but you probably don't want them in yours... but I'll write them anyway. Considered Duffs device for loop unwinding? For dividing the number of loop counter ++ calls by 8 times or more: http://en.wikipedia.org/wiki/Duff%27s_Device I'm guessing pypy already does this or something better. Outputting gcc extensions, and pragmas, for example using __builtin_expect to help branch prediction: #define likely(x) __builtin_expect((x),1) #define unlikely(x) __builtin_expect((x),0) As well as rectrict to tell it about pointer aliasing? http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Restricted-Pointers.html Also, have you considered using PREFETCH* (or gccs __builtin_prefetch) instructions when you are iterating over sequences? It might be a win If you know there is some memory coming, and can slip in some of these instructions, it's usually a win. http://gcc.gnu.org/projects/prefetch.html __builtin_constant_p for constant detection? SSE2 optimized hash functions? It seems this is a big speedup for interpreters when the hash function is sped up... I guess pypy already uses an inline cache, but I hope it would still speed things up. ... almost deleted this email, since I hate suggesting things that people might like to work on... but I didn't. oops, sorry. On Thu, Sep 30, 2010 at 1:01 PM, Armin Rigo <arigo@tunes.org> wrote:

On Thu, Sep 30, 2010 at 13:01, Armin Rigo <arigo@tunes.org> wrote:
Hi Paolo,
On Thu, Sep 30, 2010 at 8:33 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
My proposal, here, would be a "virtual guard", (...)
I see - but while I know of what for instance Java does (I had an example), here one can reuse JITted traces (up to some point) rather than just throw everything out (except maybe profiling data), add the new code path and redo all optimizations when recompiling it. I don't see how a method-at-a-time JITs could reuse more than that, and my class notes just state that code can be thrown out. My proposal is that here, also the recorded traces could be reused (if they are stored or can be recovered), exactly because one just adds a different code path and tracing JITs reason in terms of single paths. Ideally, the same trace could be recompiled (this time with a guard) the next time it is entered, without using the trace-recording interpreter, nor waiting again that it is executed N times with N > compileThreshold. Potentially, in a trace-stitching JIT like PyPy, one could even just prepend a guard to the compiled assembly/binary code (if that's position-independent, like x86-64, or can be relocated). Of course, all of this is dependent on the representation used for the "code/trace caches", because I'm not sure all needed data is kept - and this shouldn't increase too much storage requirements, especially given that invalidations might not be very frequent. So, do you have pointers in the code? Do you prefer me to ask by chat? I just checked out sources on my new dev machine (lack of disk space was one reason why I never did it). Is the History class in pypy/jit/metainterp/history.py what represents a trace? Grepping for "trace" in that folder didn't help a lot. BTW, I am not used to bother authors by asking questions when reading sources; but as far as I saw, it is considered "socially acceptable" here, isn't it? Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Hi Paolo, On Thu, Sep 30, 2010 at 8:33 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
My proposal, here, would be a "virtual guard", (...)
Yes, this proposal makes sense. It's an optimization that is definitely done in regular JITs, and we have a "to-do" task about it in http://codespeak.net/svn/pypy/extradoc/planning/jit.txt (where they are called "out-of-line guards"). Armin.

Hi, on the topic of optimizations, since I should get these out of my brain, but you probably don't want them in yours... but I'll write them anyway. Considered Duffs device for loop unwinding? For dividing the number of loop counter ++ calls by 8 times or more: http://en.wikipedia.org/wiki/Duff%27s_Device I'm guessing pypy already does this or something better. Outputting gcc extensions, and pragmas, for example using __builtin_expect to help branch prediction: #define likely(x) __builtin_expect((x),1) #define unlikely(x) __builtin_expect((x),0) As well as rectrict to tell it about pointer aliasing? http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Restricted-Pointers.html Also, have you considered using PREFETCH* (or gccs __builtin_prefetch) instructions when you are iterating over sequences? It might be a win If you know there is some memory coming, and can slip in some of these instructions, it's usually a win. http://gcc.gnu.org/projects/prefetch.html __builtin_constant_p for constant detection? SSE2 optimized hash functions? It seems this is a big speedup for interpreters when the hash function is sped up... I guess pypy already uses an inline cache, but I hope it would still speed things up. ... almost deleted this email, since I hate suggesting things that people might like to work on... but I didn't. oops, sorry. On Thu, Sep 30, 2010 at 1:01 PM, Armin Rigo <arigo@tunes.org> wrote:

On Thu, Sep 30, 2010 at 13:01, Armin Rigo <arigo@tunes.org> wrote:
Hi Paolo,
On Thu, Sep 30, 2010 at 8:33 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
My proposal, here, would be a "virtual guard", (...)
I see - but while I know of what for instance Java does (I had an example), here one can reuse JITted traces (up to some point) rather than just throw everything out (except maybe profiling data), add the new code path and redo all optimizations when recompiling it. I don't see how a method-at-a-time JITs could reuse more than that, and my class notes just state that code can be thrown out. My proposal is that here, also the recorded traces could be reused (if they are stored or can be recovered), exactly because one just adds a different code path and tracing JITs reason in terms of single paths. Ideally, the same trace could be recompiled (this time with a guard) the next time it is entered, without using the trace-recording interpreter, nor waiting again that it is executed N times with N > compileThreshold. Potentially, in a trace-stitching JIT like PyPy, one could even just prepend a guard to the compiled assembly/binary code (if that's position-independent, like x86-64, or can be relocated). Of course, all of this is dependent on the representation used for the "code/trace caches", because I'm not sure all needed data is kept - and this shouldn't increase too much storage requirements, especially given that invalidations might not be very frequent. So, do you have pointers in the code? Do you prefer me to ask by chat? I just checked out sources on my new dev machine (lack of disk space was one reason why I never did it). Is the History class in pypy/jit/metainterp/history.py what represents a trace? Grepping for "trace" in that folder didn't help a lot. BTW, I am not used to bother authors by asking questions when reading sources; but as far as I saw, it is considered "socially acceptable" here, isn't it? Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/
participants (3)
-
Armin Rigo
-
Paolo Giarrusso
-
René Dudfield