Fwd: Question on the future of RPython
I'm forwarding this mail that Terrence Cole sent me privately for no reason apparent to me, because without it my mail makes less sense. ---------- Forwarded message ---------- From: Terrence Cole <list-sink@trainedmonkeystudios.org> Date: Thu, Sep 30, 2010 at 03:21 Subject: Re: [pypy-dev] Question on the future of RPython To: Paolo Giarrusso <p.giarrusso@gmail.com> On Wed, 2010-09-29 at 23:50 +0200, Paolo Giarrusso wrote:
The uselessness of static type analysis in dynamic languages was shown concretely in 1991 on a SmallTalk derivative, Self, the father of JavaScript, but I've seen the result only in the PhD thesis* of U. Hölzle, one of Self authors - and I'm pointing it out because I'm not sure how well known it is.
I've read parts of that at some point in the past.
That might explain why Brett Canon's PhD thesis was not that successful. Of course, after reading his thesis you might be able in theory to address his concerns, and he acknowledges that type analysis might help with a few specific cases. I'm not sure about Brendan Eich's points, but I don't have the time, unfortunately, to investigate them.
Dr. Canon achieved an ~1% performance improvement. From his conclusion, the most significant problem he found was that most program data is stored in object attributes and object attributes were not type-inferable with the methods he used.
* Title: "Adaptive optimization for Self: Reconciling High Performance with Exploratory Programming". See, among others, sec. 7.2, "Type analysis does not improve performance of object-oriented programs", sec. 7.4.1, "type analysis exhibits unstable performance". The conclusion is that in most cases it helps more to use "type feedback" , indicates profile-guided optimization applied
Scanning through the 7.2, they say: "In general, type analysis cannot infer the result types of non-inlined message sends, of arguments of non-inlined sends, or of assignable slots (i.e., instance variables). Since sends to such values are fairly frequent, a large fraction of message sends does not benefit from type analysis." So they seem to have run into similar problems: the type inferencing algorithm doesn't get to sink its teeth into any of the bits that most need it. Chapter3 of Cannon05 details what specific parts of python can't be type-inferred. It turns out this is most of python. For example, type inferencing can't cross module boundaries because the module used at runtime might be different from the one present at compile time. While perfectly true, this is probably going to be quite rare in practice. I think it makes sense to treat the sorts of optimizations you can do with static analysis of a dynamic language with the same sort of guarded optimism that is used in pypy's jit compiler: run with the assumption that it will all work out, but watch for failure and fallback to a safe slow-path nicely when things go off the rails.
See below further inline replies.
On Tue, Sep 28, 2010 at 21:49, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
I think this is a disconnect. Applying a jit to a non-interpretted language -- Jacob here seems to think I was talking about a static, compiled subset of python -- makes little sense.
JIT just makes much _more_ sense for dynamic languages. Profile-guided optimization (PGO) give impressive performance improvements for C, but can also worsen performance when the execution profile changes (because of different inputs), and JIT would solve this problem. Since we are talking about optimized C, "impressive performance improvements" is more like 20% more (so to say) than like 10x-100x, and it's mostly about things like static branch prediction. For RPython, it makes more sense to reuse the support from the translation backend (JVM and .NET should do this by default, in C you can use PGO on the translation output.
There are some undesirable things about static analysis, but it can sure be useful from optimisation, security and reliability perspectives.
Brendan Eich agrees [2]. This is heartening, because javascript has much in common with python.
[1] http://www.ocf.berkeley.edu/~bac/thesis.pdf [2] http://brendaneich.com/2010/08/static-analysis-ftw/
-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/
Hi Terrence, I think that what you are describing is found in informal discussions about LLVM/HLVM, and more formally in the plans for Unladen Swallow at http://code.google.com/p/unladen-swallow/wiki/ProjectPlan . See in particular the section about Feedback-Directed Optimization. Maybe you want to discuss these ideas with the Unladen Swallow guys instead :-) Armin.
On Thu, 2010-09-30 at 08:35 +0200, Paolo Giarrusso wrote:
I'm forwarding this mail that Terrence Cole sent me privately for no reason apparent to me, because without it my mail makes less sense.
I replied to you privately because the mail from you that I was replying to did not have the list cc'd. I checked the headers several times to be sure that my mail client was not lying to me. I thought that you wanted to take the ensuing discussion off list, as it is starting to range even farther off topic for pypy-dev than it was before. Oh well. Sorry for the noise.
---------- Forwarded message ---------- From: Terrence Cole <list-sink@trainedmonkeystudios.org> Date: Thu, Sep 30, 2010 at 03:21 Subject: Re: [pypy-dev] Question on the future of RPython To: Paolo Giarrusso <p.giarrusso@gmail.com>
On Wed, 2010-09-29 at 23:50 +0200, Paolo Giarrusso wrote:
The uselessness of static type analysis in dynamic languages was shown concretely in 1991 on a SmallTalk derivative, Self, the father of JavaScript, but I've seen the result only in the PhD thesis* of U. Hölzle, one of Self authors - and I'm pointing it out because I'm not sure how well known it is.
I've read parts of that at some point in the past.
That might explain why Brett Canon's PhD thesis was not that successful. Of course, after reading his thesis you might be able in theory to address his concerns, and he acknowledges that type analysis might help with a few specific cases. I'm not sure about Brendan Eich's points, but I don't have the time, unfortunately, to investigate them.
Dr. Canon achieved an ~1% performance improvement. From his conclusion, the most significant problem he found was that most program data is stored in object attributes and object attributes were not type-inferable with the methods he used.
* Title: "Adaptive optimization for Self: Reconciling High Performance with Exploratory Programming". See, among others, sec. 7.2, "Type analysis does not improve performance of object-oriented programs", sec. 7.4.1, "type analysis exhibits unstable performance". The conclusion is that in most cases it helps more to use "type feedback" , indicates profile-guided optimization applied
Scanning through the 7.2, they say:
"In general, type analysis cannot infer the result types of non-inlined message sends, of arguments of non-inlined sends, or of assignable slots (i.e., instance variables). Since sends to such values are fairly frequent, a large fraction of message sends does not benefit from type analysis."
So they seem to have run into similar problems: the type inferencing algorithm doesn't get to sink its teeth into any of the bits that most need it.
Chapter3 of Cannon05 details what specific parts of python can't be type-inferred. It turns out this is most of python. For example, type inferencing can't cross module boundaries because the module used at runtime might be different from the one present at compile time. While perfectly true, this is probably going to be quite rare in practice.
I think it makes sense to treat the sorts of optimizations you can do with static analysis of a dynamic language with the same sort of guarded optimism that is used in pypy's jit compiler: run with the assumption that it will all work out, but watch for failure and fallback to a safe slow-path nicely when things go off the rails.
See below further inline replies.
On Tue, Sep 28, 2010 at 21:49, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
I think this is a disconnect. Applying a jit to a non-interpretted language -- Jacob here seems to think I was talking about a static, compiled subset of python -- makes little sense.
JIT just makes much _more_ sense for dynamic languages. Profile-guided optimization (PGO) give impressive performance improvements for C, but can also worsen performance when the execution profile changes (because of different inputs), and JIT would solve this problem. Since we are talking about optimized C, "impressive performance improvements" is more like 20% more (so to say) than like 10x-100x, and it's mostly about things like static branch prediction. For RPython, it makes more sense to reuse the support from the translation backend (JVM and .NET should do this by default, in C you can use PGO on the translation output.
There are some undesirable things about static analysis, but it can sure be useful from optimisation, security and reliability perspectives.
Brendan Eich agrees [2]. This is heartening, because javascript has much in common with python.
[1] http://www.ocf.berkeley.edu/~bac/thesis.pdf [2] http://brendaneich.com/2010/08/static-analysis-ftw/
-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/ _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
participants (3)
-
Armin Rigo -
Paolo Giarrusso -
Terrence Cole