Question on the future of RPython

I understand from various threads here, that RPython is not for general purpose use. Why this lack of Focus on general use. I am looking at this and I am thinking and comparing this to a corporation that is working on this awesome product. They are so focused on this awesome final product vision that they fail to realize the awesome potential of some if its intermediate side deliverables. PyPy is definitely gaining momentum. But as a strategy to build that momentum, and gain new converts it should put some focus on some of its niche strengths. Things other python implementions cannot do. One such niche is its RPython and RPython Compiler. No other python implementation can convert python programs to executables. I am seeing growing interest in writing Rpython code for performance critical code and even potentially compiling it to binaries. http://olliwang.com/2009/12/20/aes-implementation-in-rpython/ http://alexgaynor.net/2010/may/15/pypy-future-python/ Is it possible the PyPy team may be understating the significance of RPython? Am I crazy to think this way? :-) Sarvi

No other python implementation can convert python programs to executables.
There's shedskin, which is actually very good as these things go: http://code.google.com/p/shedskin/ Like RPython, you have to write in a small subset of python which can be a little frustrating once you've gotten used to pythonic freedom. But I've found it very useful for some short numerical codes (putting on my OEIS associate editor hat). And Cython is pretty powerful these days. ObPyPy: the other day I had cause to run a very short, unoptimized, mostly integer-arithmetic code. With shedskin, it took between ~42s (with ints) and ~1m43 (with longs), as compared with only ~3m30 or so to run under pypy. That's only a factor of two (if I'd needed longs). Both could be much improved, and a lower-level version in C would beat them both, but I was very impressed by how little difference there was. Major props! For numerics it'd be interesting to have a JIT option which didn't care about compilation times, and instead of generating assembly itself generated assembly-like C which was then delegated to an external compiler. Doug -- Department of Earth Sciences University of Hong Kong

On Thu, Sep 2, 2010 at 08:27, Douglas McNeil <mcneil@hku.hk> wrote:
A more interesting road (which is mentioned somewhere in the PyPy blog) is to use LLVM in place of this "external JIT compiler", so that you generate "assembly-like LLVM Intermediate Representation". A bit like UnladenSwallow is doing, with the difference of having a saner runtime model to start with (say, no reference counting). Once you start with LLVM, you are free to choose which optimization passes to run, from very little to -O3 to even more ones. The other C compilers incur huge startup costs for no good, and don't usually allow being used as a library, if just for engineering problems. LLVM is so much cooler anyway, especially now that say _everybody_ is switching to it. About the compilation times tradeoff, you can look for "tiered compilation", which is a general strategy for doing it automatically, possibly allowing different tunings (say, like java -server, which is tuned for performance rather than responsiveness). My authoritative reference is Cliff Click's blog [1], but you probably want to stop reading it after the introduction, as I did in this case. [1] http://www.azulsystems.com/blog/cliff-click/2010-07-16-tiered-compilation -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

If PyPy is using RPython for its compiler implementation, it should and will be optimized eventually for its compiler/JIT to to be fast. Which just tells me that the performance gap between Shedskin and PyPy will be narrowed/beat pretty soon. I would still rather work with just one interpreter/compiler, say PyPy. Better than using PyPy/CPython for an interpreter and Cython/Shedkin for compiling, without interpreter support during development. I am just seeing Cython/Shedskin as fragmentation of resources. A lot more could accomplished if these projects came together with PyPy. If you ask this question on the Shedskin/Cython alias as to why they shouldn't pool resources into making the PyPy RPython compiler into a First class citizen/goal of PyPy. They will immediately tell you its not a goal PyPy. Why not officially make it so. Formalize RPython and its compiler. Obviate the need for Cython/Shedskin and get them on board. Like the example I quoated. Mercurial is 95% python 5% C for peformance. It should be 95% python and 5% RPython. We have Pickle and cPickle for performance. The Pickle could have simply been rewritten in RPython and probably compiled and we don't need different versons :-)) Sarvi ----- Original Message ---- From: Douglas McNeil <mcneil@hku.hk> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: "pypy-dev@codespeak.net" <pypy-dev@codespeak.net> Sent: Wed, September 1, 2010 11:27:32 PM Subject: Re: [pypy-dev] Question on the future of RPython
No other python implementation can convert python programs to executables.
There's shedskin, which is actually very good as these things go: http://code.google.com/p/shedskin/ Like RPython, you have to write in a small subset of python which can be a little frustrating once you've gotten used to pythonic freedom. But I've found it very useful for some short numerical codes (putting on my OEIS associate editor hat). And Cython is pretty powerful these days. ObPyPy: the other day I had cause to run a very short, unoptimized, mostly integer-arithmetic code. With shedskin, it took between ~42s (with ints) and ~1m43 (with longs), as compared with only ~3m30 or so to run under pypy. That's only a factor of two (if I'd needed longs). Both could be much improved, and a lower-level version in C would beat them both, but I was very impressed by how little difference there was. Major props! For numerics it'd be interesting to have a JIT option which didn't care about compilation times, and instead of generating assembly itself generated assembly-like C which was then delegated to an external compiler. Doug -- Department of Earth Sciences University of Hong Kong

awesome. The point I was making is that RPython(a static subset of Python) will be faster than Dynamic Python code on a JIT or compiled to machine code. Sarvi ----- Original Message ---- From: Amaury Forgeot d'Arc <amauryfa@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: Douglas McNeil <mcneil@hku.hk>; "pypy-dev@codespeak.net" <pypy-dev@codespeak.net> Sent: Thu, September 2, 2010 1:28:14 AM Subject: Re: [pypy-dev] Question on the future of RPython Hi, 2010/9/2 Saravanan Shanmugham <sarvi@yahoo.com>:
The PyPy way is much simpler: there is only the original pickle.py, written in plain full Python, and it's as fast as a C or RPython implementation. -- Amaury Forgeot d'Arc

But what makes you think that? A dynamic compiler has more information, so it should be able to produce better code. On 02/09/2010 6:37 PM, "Saravanan Shanmugham" <sarvi@yahoo.com> wrote: awesome. The point I was making is that RPython(a static subset of Python) will be faster than Dynamic Python code on a JIT or compiled to machine code. Sarvi ----- Original Message ---- From: Amaury Forgeot d'Arc <amauryfa@gmail.com> To: Saravanan Shanmugh... Sent: Thu, September 2, 2010 1:28:14 AM Subject: Re: [pypy-dev] Question on the future of RPython Hi, 2010/9/2 Saravanan Shanmugham <sarvi@yahoo.com>:
We have Pickle and cPickle for performance. ...

On Thu, Sep 2, 2010 at 10:40, William Leslie <william.leslie.ttg@gmail.com>wrote:
But what makes you think that? A dynamic compiler has more information, so it should be able to produce better code.
Note that he's not arguing about a static compiler for the same code, which has no type information, and where you are obviously right. He's arguing about a statically typed language, where the type information is already there in the source, e.g. C - there is much less information missing. Actually, your point can still be made, but it becomes much less obvious. For this case, it's much more contended what's best - see the "java faster than C" debate. Nobody has yet given a proof convincing enough to close the debate. I would say that there's a tradeoff between JIT and Ahead-Of-Time compilation, when AOT makes sense (not in Python, SmallTalk, Self...). On 02/09/2010 6:37 PM, "Saravanan Shanmugham" <sarvi@yahoo.com> wrote:
Run-time specialization would allow exactly the same code to be generated, without any extra guards in the inner loop. Java can do that at times, and can even be better than C, but not always (see above). You'd need a static compiler with Profile-Guided Optimization and have a profile which matches runtime, to guarantee superior results. Cheers -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

So far as I can tell from Unladed Swallow and PyPy, it is some of these Dynamic features of Python, such as Dynamic Typing that make it hard to compile/optimize and hit C like speeds. Hence the need for RPython in PyPy or Restricted Python in Shedskin? Sarvi ________________________________ From: William Leslie <william.leslie.ttg@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: "pypy-dev@codespeak.net" <pypy-dev@codespeak.net>; Amaury Forgeot d'Arc <amauryfa@gmail.com> Sent: Thu, September 2, 2010 1:40:03 AM Subject: Re: [pypy-dev] Question on the future of RPython But what makes you think that? A dynamic compiler has more information, so it should be able to produce better code. On 02/09/2010 6:37 PM, "Saravanan Shanmugham" <sarvi@yahoo.com> wrote:
awesome.
The point I was making is that RPython(a static subset of Python) will be
faster

On 2 September 2010 19:00, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Hence the need for the JIT, not rpython. Rpython is an implementation detail, to support translation easily to C as well as CLI and JVM bytecode, and to support translation aspects such as stackless, and testing on top of a full python environment. Rewriting things in rpython for performance is a hack that should stop happening as the JIT matures. Dynamic typing means you need to do more work to produce code of the same performance, but it's not impossible. On 2 September 2010 18:56, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
Sure - having static type guarantees is another case of "more information". There is a little more room for discussion here, because there are cases where a dynamic compiler for a safe runtime can do better at considering certain optimisations, too. We have been talking about our stock-standard type systems here, which ensure that our object will have the field or method that we are interested in at runtime, and perhaps (as long as it isn't an interface method, which we don't have in rpython anyway) the offset into the instance or vtable respectively. That makes for a pretty decent optimisation, but type systems can infer much more than this, including which objects may escape (via region typing a-la cyclone), which fields may be None, and which instructions are loop invariant. The point is that some of these type systems work fine with separate compilation, and some do significantly better with runtime or linktime specialisation. On 2 September 2010 17:56, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
Rtyping is whole-program.
Functional languages allow separate compilation - is there any RPython-specific problem for that? I've omitted my guesses here.
Many do, yes. To use ML derivatives as an example, you require the signature of any modules you directly import. I was recently reading about MLKit's module system, which is quite interesting (it has region typing, and the way it unifies these types at the module boundary is cute - carrying region information around in the source text is fragile, so must be inferred). Haskell is kind of a special case, requiring dictionaries to be passed around at runtime to determine which method of some typeclass to call. For OCaml (most MLs are similar) see section 2.5: "Modules and separate compilation" of http://pauillac.inria.fr/ocaml/htmlman/manual004.html On MLKit's module implementation and region inference: http://www.itu.dk/research/mlkit/index.php/Static_Interpretation -- William Leslie

On 2 September 2010 15:54, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Because then we would have to support that general use. Python benefits from being reasonably standardised, you can be sure that most python you write will run on any implementation that supports the version you are targetting. On the other hand, if you are mangling cpython or pypy bytecode, you are asking for trouble. Rpython is an example of such an implementation detail - we* might like to change features of it here or there to better support some needed pattern. Introducing yet another incompatable and complicated language to the python ecosystem is not a worthwhile goal in itself. * Just my opinion. Others might feel like standardising some amount of rpython is a worthwhile idea.
I can't see why you would ever want to do this - if you use py2exe or the like instead, you get a large standard library and a great language to work in, neither of which you get if you use rpython.
I am seeing growing interest in writing Rpython code for performance critical code and even potentially compiling it to binaries.
The intention is to get almost the same performance out of the JIT. For those that actually care about the last few percent, it would be nicer to provide hints to generate specialised code at module compile time, that way you can still work at python level.
Is it possible the PyPy team may be understating the significance of RPython? Am I crazy to think this way? :-)
Supporting better integration between app-level python and other languages that interact with interpreter level would be nice. CLI integration is good, and JVM integration is lagging just a little. But once you can interact with that level, there are much saner languages that you could use for your low-level code than rpython - languages /designed/ to be general purpose languages. At the moment, the lack of separate compilation is a real issue standing in the way of using rpython as a general purpose language, or even as an extension language. Having to re-translate *everything* every time you want to install an extension module is not on. Even C doesn't require that. The other is that type inference is global and changes you make to one function can have far-reaching consequences. The error messages when you do screw up aren't very friendly either. If you want a low-level general purpose language with type inference and garbage collection that has implementations for every platform pypy targets, there are already plenty of options. -- William Leslie

On Thu, Sep 2, 2010 at 9:56 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
There is no notion of a "module" in RPython. RPython is compiled from live python objects (hence python is a metaprogramming language for RPython). There is a bunch of technical problems, but it's generally possible to implement separate compilation (it's work though). Cheers, fijal

I afraid people are missing the point here. For an average engineer its better to be an expert of 1 language than be an average at 4. Thats my take on things. Take Merurial(an SCM) 95% python 5%C and gives GIT a run for its money This could be 95%Python and 5%RPython.
From my reading on PyPy, thats why yall chose to write the PyPy in RPython. Yall could have done in this C/C++ right?
So far as I can tell RPython is a strict subset of Python. I don't see why it shouldn't continue to be. And even if yall needed to make a very small set of static extension to RPython, you wouldn't any worse that Cython and Shedskin. I would still rather work with just one interpreter/compiler, say PyPy. Better than PyPy/CPython for interpreter and Cython/Shedkin for compiling, with interpreter support during development. I am just seeing Cython/Shedskin as fragmentation of resources. A lot more could accomplished if these projects came together. Sarvi ----- Original Message ---- From: William Leslie <william.leslie.ttg@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 12:09:03 AM Subject: Re: [pypy-dev] Question on the future of RPython On 2 September 2010 15:54, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Because then we would have to support that general use. Python benefits from being reasonably standardised, you can be sure that most python you write will run on any implementation that supports the version you are targetting. On the other hand, if you are mangling cpython or pypy bytecode, you are asking for trouble. Rpython is an example of such an implementation detail - we* might like to change features of it here or there to better support some needed pattern. Introducing yet another incompatable and complicated language to the python ecosystem is not a worthwhile goal in itself. * Just my opinion. Others might feel like standardising some amount of rpython is a worthwhile idea.
I can't see why you would ever want to do this - if you use py2exe or the like instead, you get a large standard library and a great language to work in, neither of which you get if you use rpython.
I am seeing growing interest in writing Rpython code for performance critical code and even potentially compiling it to binaries.
The intention is to get almost the same performance out of the JIT. For those that actually care about the last few percent, it would be nicer to provide hints to generate specialised code at module compile time, that way you can still work at python level.
Is it possible the PyPy team may be understating the significance of RPython? Am I crazy to think this way? :-)
Supporting better integration between app-level python and other languages that interact with interpreter level would be nice. CLI integration is good, and JVM integration is lagging just a little. But once you can interact with that level, there are much saner languages that you could use for your low-level code than rpython - languages /designed/ to be general purpose languages. At the moment, the lack of separate compilation is a real issue standing in the way of using rpython as a general purpose language, or even as an extension language. Having to re-translate *everything* every time you want to install an extension module is not on. Even C doesn't require that. The other is that type inference is global and changes you make to one function can have far-reaching consequences. The error messages when you do screw up aren't very friendly either. If you want a low-level general purpose language with type inference and garbage collection that has implementations for every platform pypy targets, there are already plenty of options. -- William Leslie

Saravanan Shanmugham, 02.09.2010 09:57:
Well, it's certainly better to be an almost-expert in two, than a no-left-no-right expert in only one.
I am just seeing Cython/Shedskin as fragmentation of resources.
You might want to closer look at the projects and their goals before judging that way. Stefan

I have researched these projects quite extensively. Quite similar beasts as far as I can tell. Cython/Pyrex used to write python extensions. They use statically typed variants of Python which gets compiled into C which can then be compiled. Shedskin is slightly more general purpose Restricted Python to C++ compiler. PyPy as I understand can convert RPython into C code Am I missing something here? Sarvi ----- Original Message ---- From: Stefan Behnel <stefan_ml@behnel.de> To: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 1:11:10 AM Subject: Re: [pypy-dev] Question on the future of RPython Saravanan Shanmugham, 02.09.2010 09:57:
Well, it's certainly better to be an almost-expert in two, than a no-left-no-right expert in only one.
I am just seeing Cython/Shedskin as fragmentation of resources.
You might want to closer look at the projects and their goals before judging that way. Stefan _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

On Thu, Sep 2, 2010 at 10:18, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Maybe you have done extensive research, but the above is not enough for the conclusion, which might still be valid. There could be some cool way to reuse each other's code, and that would be cool given the available manpower. The question is: => Do different goals cause _incompatible_ design/implementation choices? Currently, static typing versus global type inference seems to be already a fundamental difference. Modular type inference, if polymorphic (and I guess it has to be), would require using boxed or tagged integers more often, as far as I can see. RPython is intended to be compiled to various environments, with different variations (choose GC/stackful or stackless/heaps of other choices), and its programmers are somewhat OK with its limitations; it has type inference, with its set of tradeoffs. This for instance prevents reusing shedskin, and probably even prevents reusing any of its code. Cython/Shedskin are intended to be used by more people and to be simpler. ==> Would making RPython usable for people harm its usability for PyPy? I see no trivial answer to the above questions which allows merging, but I don't develop any of them. However, a discussion of this could probably end in a PyPy FAQ. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Thursday 02 September 2010 you wrote:
RPython was tried in a production environment some years ago and while it produced some very nice results, it was quite difficult to work with. Dealing with those difficulties requires a group of people who are willing to build RPython code for general applications, run the code and identify what the difficulties actually are. Then they need to come up with strategies for how to remedy the problems and implement them in code. This is a very large undertaking for which Pypy does not have the manpower.. It also reqires people who are interested in building support for compiled programming languages. Pypy is a volunteer effort and the only person who was interested in this has retired from the project. Jacob Hallén

one response inline ----- Original Message ---- From: Jacob Hallén <jacob@openend.se> To: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 2:40:45 AM Subject: Re: [pypy-dev] Question on the future of RPython Thursday 02 September 2010 you wrote:
RPython was tried in a production environment some years ago and while it produced some very nice results, it was quite difficult to work with. Dealing with those difficulties requires a group of people who are willing to build RPython code for general applications, run the code and identify what the difficulties actually are. Then they need to come up with strategies for how to remedy the problems and implement them in code. This is a very large undertaking for which Pypy does not have the manpower.. It also reqires people who are interested in building support for compiled programming languages. Pypy is a volunteer effort and the only person who was interested in this has retired from the project. Sarvi>>>> This makes sense. But wouldn't the answer to this problem be to invite people like the Shedskin/Cython developers to join forces with PyPy? So that they can pursue the general RPython usecase you mention above while the others focus on JIT and stuff on a common code base? Wouldn't that be a win-win for everybody? This collaboration feels so obvious to me, that I am confused why it isn't to others. Considering that Shedskin's goals feel almost like a strict subset of PyPy. Sarvi Jacob Hallén

On Thu, Sep 2, 2010 at 2:18 PM, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
I think what you don't get is how open source works, there is always ten projects doing almost the same thing. Everyone at least once thought "why does linux has this many media players/text editors/flash implementations/jvm when all we need is a really good one with lots of support". I does get me depressed sometimes, but this is the way it is. Cython has a big user base that they have to support and lots of programs that are in production today, shedskin is looking for pure performance and the pypy guys want to have a faster python. Although I also think that maybe RPython and the pypy python interpreter could solve all this problems someday it doesn't do so right now. I have used RPython in the past and the error messages alone would drive some people away. Some group of people could work to fix this, but I doubt it will happen soon. What I think could be done to make pypy more visible to people would be to have a killer app running on pypy way faster/better than on cpython. For me this app is either mercurial or django. -- Leonardo Santagada

----- Original Message ---- From: Leonardo Santagada <santagada@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: Jacob Hallén <jacob@openend.se>; pypy-dev@codespeak.net Sent: Thu, September 2, 2010 11:18:32 AM Subject: Re: [pypy-dev] Question on the future of RPython On Thu, Sep 2, 2010 at 2:18 PM, Saravanan Shanmugham <sarvi@yahoo.com> wrote: the
I think what you don't get is how open source works, there is always ten projects doing almost the same thing. Everyone at least once thought "why does linux has this many media players/text editors/flash implementations/jvm when all we need is a really good one with lots of support". I does get me depressed sometimes, but this is the way it is. Cython has a big user base that they have to support and lots of programs that are in production today, shedskin is looking for pure performance and the pypy guys want to have a faster python. Although I also think that maybe RPython and the pypy python interpreter could solve all this problems someday it doesn't do so right now. I have used RPython in the past and the error messages alone would drive some people away. Some group of people could work to fix this, but I doubt it will happen soon. Sarvi>> Yeah having this thread of conversation on 4 separate aliases, Python.org, Unladen Swallow, Shedskn and PyPy was just my attempt at seeing if these can come together. Oh well. What I think could be done to make pypy more visible to people would be to have a killer app running on pypy way faster/better than on cpython. For me this app is either mercurial or django. Sarvi>>> Very True. Sarvi -- Leonardo Santagada

Thursday 02 September 2010 you wrote:
It is a matter of personal pride, I think. If we made the invitation to the Shedskin people they would see this as "Pypy thinks they are way cooler than us, so they invite us to be part of their project". This would naturally generate a refusal, because even though we don't make such value statements, it would be viewed that way. So, we don't make such invitations, even if they make sense. What we hope is that some people examine the Pypy project and find that it actually is a really cool piece of technology with lots of possible side projects and expansion possibilities. If they decide to join the project we will give them all the help we are capable of. Most people have actually joined Pypy in this way. The most recent example is Håkan Ardö who wanted to expand Pypy in the direction of numeric calculations. The learning curve is fairly steep, but there are quite a few people on the IRC channel who are ready to help you overcome the hurdles. Jacob Hallén

I have heard repeatedly in this alias that PyPy's RPython is very difficult to use. I have also heard here and elsewhere that Shedskin fast and is great for what it does i.e. translate its version of Restricted Python to C++. Which then begs the question, would it make sense for PyPy to adopt Shedskin to compile its PyPy RPython code into C++/binary. Sarvi ----- Original Message ---- From: Jacob Hallén <jacob@openend.se> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 1:54:01 PM Subject: Re: [pypy-dev] Question on the future of RPython Thursday 02 September 2010 you wrote:
It is a matter of personal pride, I think. If we made the invitation to the Shedskin people they would see this as "Pypy thinks they are way cooler than us, so they invite us to be part of their project". This would naturally generate a refusal, because even though we don't make such value statements, it would be viewed that way. So, we don't make such invitations, even if they make sense. What we hope is that some people examine the Pypy project and find that it actually is a really cool piece of technology with lots of possible side projects and expansion possibilities. If they decide to join the project we will give them all the help we are capable of. Most people have actually joined Pypy in this way. The most recent example is Håkan Ardö who wanted to expand Pypy in the direction of numeric calculations. The learning curve is fairly steep, but there are quite a few people on the IRC channel who are ready to help you overcome the hurdles. Jacob Hallén

Lets not be a little presumptious shall we. This is the second time you seem to be claiming that I haven't done my research/reading. I have been following the progress of PyPy over 2 years. Its great work. So is Shedskin. Just for the record, I have used PyRex, Cython. And have read the documentation and/or sample code for both Shedskin and PyPy If people say that there are emotional and pragmatic reasons for the 2 projects not coming together. That makes sense. I just don't see any logical reasons, thats all. And I haven't heard any on this thread either. BTW, Just because I top post for "readability" doesn't mean I haven't read all the threads in detail. Sarvi ----- Original Message ---- From: Stefan Behnel <stefan_ml@behnel.de> To: pypy-dev@codespeak.net Sent: Fri, September 3, 2010 2:30:32 AM Subject: Re: [pypy-dev] Question on the future of RPython Saravanan Shanmugham, 03.09.2010 11:11:
You should seriously read and try to understand the e-mails that you reply to, instead of top-posting them away. Stefan _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Saravanan Shanmugham, 03.09.2010 19:22:
That's just the impression that I get from what you write and how you write it.
I just don't see any logical reasons, thats all. And I haven't heard any on this thread either.
Well, you are talking to people who know a lot more about what you are talking about than you do. It's normal that they are not equally enthusiastic about pie-in-the-sky ideas that someone throws at them.
BTW, Just because I top post for "readability" doesn't mean I haven't read all the threads in detail.
I like the fact that you put marks of irony around the word "readability". Stefan

Stefan, If I were to go with my impressions, based on you being the lead developer of Cython, I could have claimed you have an ulterior motive on this thread. But then I didn't because, inspite of first impressions/scepticism I believe we are all here with a genuine interest to improve the Python environment and get more visibiity and momentum on PyPy. Personally I cant wait to see PyPy become the default Python. :-) Lets start with an understanding that we are all smart people with good ideas and lets also not get too cocky enough to think we have all the answers. I saw some genuine synergies that I was calling out. And I have heard some pragmatic arguments from others though no one has necessarily claimed logical impossibility on why this may not work. And I can understand that. Sarvi ----- Original Message ---- From: Stefan Behnel <stefan_ml@behnel.de> To: pypy-dev@codespeak.net Sent: Fri, September 3, 2010 10:52:41 AM Subject: Re: [pypy-dev] Question on the future of RPython Saravanan Shanmugham, 03.09.2010 19:22:
That's just the impression that I get from what you write and how you write it.
Well, you are talking to people who know a lot more about what you are talking about than you do. It's normal that they are not equally enthusiastic about pie-in-the-sky ideas that someone throws at them.
BTW, Just because I top post for "readability" doesn't mean I haven't read all the threads in detail.
I like the fact that you put marks of irony around the word "readability". Stefan _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

On Fri, Sep 3, 2010 at 21:06, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
But then I didn't because, inspite of first impressions/scepticism I believe we are all here with a genuine interest to improve the Python environment
While you didn't initiate the flame, I think that's totally inappropriate, and I can say so even without knowing Stefan. You wrote: proper research if you just used the projects. You are welcome to be curious, but with such a comment you are the presumptuous one. Note that I already remarked that Stefan's comment was not appropriate in style. b) your email client _is_ crappy, given the way you reply inline (I was mentioning crappy clients in my previous email). Socially speaking, in an Open Source community, not using a decent email client can look as bad as dressing very very wrong. I'm not so picky, but it does mean you're not a hacker. Note I'm not a developer of PyPy, and I don't claim being an expert, but I have some technical knowledge of its documentation about internals and of some literature, and some small experience with a Python implementation. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

----- Original Message ---- From: Paolo Giarrusso <p.giarrusso@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: Stefan Behnel <stefan_ml@behnel.de>; pypy-dev@codespeak.net Sent: Fri, September 3, 2010 5:15:47 PM Subject: Re: [pypy-dev] Question on the future of RPython On Fri, Sep 3, 2010 at 21:06, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
While you didn't initiate the flame, I think that's totally inappropriate, and I can say so even without knowing Stefan. Sarvi>> I believe it is an appropriate response to the flame bait. BTW, I was very careful not to make the accusation. No real offense meant. It was just a what if argument to drive the point that if every one responded like that, based on impression and presumptions, it would be wrong. So I standby that. You wrote: proper research if you just used the projects. You are welcome to be curious, but with such a comment you are the presumptuous one. Note that I already remarked that Stefan's comment was not appropriate in style. Sarvi>> We may have to agree to disagree here. I don't believe my thread of discussion has anything to do with Virtual Machines at al. What I have been saying has more to do with compiling plain RPython code into C/C++/ASM executables. Shedskin uses a statically typed restricted version of Python that gets converted to C++ PyPy does convert a statically typed restricted version of Python to C that can then be compiled to an executable. So though with different approachs the final goal is to produce an compiled binary executable for the RPython code. Agreed PyPy does additionally allow using Language/JIT hints to help write/generate JIT compilers as well. That does not remove the possibility that the statically typed version of Restricted Python used by Shedskin cannot be full subset of the PyPy RPython. Nor that there is a possibility of using PyPy as just a plain/pure Restricted Python compiler. pure and simple. This thought angle has nothing to do with Virtual Machines, really. b) your email client _is_ crappy, given the way you reply inline (I was mentioning crappy clients in my previous email). Socially speaking, in an Open Source community, not using a decent email client can look as bad as dressing very very wrong. I'm not so picky, but it does mean you're not a hacker. Sarvi>>> Point taken. I use plain Yahoo Web Mail. Do you have any suggestion how I could do better with the Yahoo Web Mail client??? I am open learning a better way. :-) Will look into it. Thanks, Sarvi Note I'm not a developer of PyPy, and I don't claim being an expert, but I have some technical knowledge of its documentation about internals and of some literature, and some small experience with a Python implementation. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Hi, Can we please close this thread? The basic answer you will get from anybody that actually worked at least a bit with PyPy is that all your discussions are moving air around and nothing else. There is no one working with PyPy that is interested in using RPython for the purpose of compiling some RPython programs to C code statically (except interpreters). If anyone is really interested in this topic he can (again) give it a try. He would get some help from us, i.e. the rest of the PyPy team, but it would be a fork for now. I say "again" because there are some previous attempts at doing that, which all failed. As long as no such project exists and is successful -- and I have some doubts about it -- I will not believe in the nice (and, to me, completely bogus) claims made on this thread, like "let's bring RPython and Shedskin together". A bientot, Armin.

Hi Armin, Could you point me to some of these previous attempts at improving RPython-to-Executable capability. I would like to understand what was attempted. Hart's Antler, who seems to be working on RPython quite extensively contacted me privately about dong some work in the RPython area. I am considering sponsoring him in to do some work on PyPy,only if it is done with the PyPy teams blessings and will help help PyPy as a whole. Is there a wish list of RPython enhancements somewhere that the PyPy team might be considering? Stuff that would benefit RPython users in general. Sarvi ----- Original Message ----

Hi, On Mon, Sep 6, 2010 at 8:27 PM, Saravanan Shanmugham
I feel like I am repeating myself so that's my last mail to this thread. There are no enhancements we are considering to benefit other RPython users because *there* *are* *no* *other* *RPython* *users.* There is only us and RPython suits us just fine for the purpose for which it was designed. Again, feel free to make a fork or a branch of PyPy and try to develop a version of RPython that is more suited to writing general programs in. I don't know if there is a wish list of what is missing, but certainly I haven't given it much thoughts myself. Personally, I think that writing RPython programs is kind of fun, but in a perverse way -- if I could just write plain Python that was as fast or mostly as fast, it would be perfect. A bientôt, Armin.

i just had a (probably) silly idea. :) if some people like rpython so much, how about writing a rpython interpreter in rpython? wouldn't it be much easier for the jit to optimize rpython code? couldn't jitted rpython code theoretically be as fast as a program that got compiled to c from rpython? hm... but i wonder if this would make sense at all. maybe if you ran rpython code with pypy-c-jit, it already could be jitted as well as with a special rpython interpreter? ...if there were a special rpython interpreter, would the current jit generator have to be changed to take advantage of the more simple language? just curious... On Tue, Sep 7, 2010 at 11:07 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

The current JIT generator creates a tracing jit, which gives very different performance profile to static compilation. For tight loops etc this might be ok, but might be different for the specific use case people are interested in (I admit I still don't know what that is). On 26/09/2010 1:47 AM, "horace grant" <horace3d@gmail.com> wrote: i just had a (probably) silly idea. :) if some people like rpython so much, how about writing a rpython interpreter in rpython? wouldn't it be much easier for the jit to optimize rpython code? couldn't jitted rpython code theoretically be as fast as a program that got compiled to c from rpython? hm... but i wonder if this would make sense at all. maybe if you ran rpython code with pypy-c-jit, it already could be jitted as well as with a special rpython interpreter? ...if there were a special rpython interpreter, would the current jit generator have to be changed to take advantage of the more simple language? just curious... On Tue, Sep 7, 2010 at 11:07 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Armin Rigo, 07.09.20...

On Sat, 2010-09-25 at 17:47 +0200, horace grant wrote:
An excellent question at least. A better idea, I think, would be to ask what subset of full-python will jit well. What I'd really like to see is a static analyzer that can display (e.g. by coloring names or lines) how "jit friendly" a piece of python code is. This would allow a programmer to get an idea of what help the jit is going to be when running their code and, hopefully, help people avoid tragic performance results. Naturally, for performance intensive code, you would still need to profile, but for a lot of uses, simply not having catastrophically bad performance is more than enough for a good user experience. With such a tool, it wouldn't really matter if the answer to "what is faster" is RPython -- it would be whatever python language subset happens to work well in a particular case. I've started working on something like this [1], but given that I'm doing a startup, I don't have nearly the time I would need to make this useful in the near-term. -Terrence [1] http://github.com/terrence2/melano

On Sun, 2010-09-26 at 23:57 -0700, Saravanan Shanmugham wrote:
What I wrote has apparently been widely misunderstood, so let me explain what I mean in more detail. What I want is _not_ RPython and it is _not_ Shedskin. What I want is not a compiler at all. What I want is a visual tool, for example, a plugin to an IDE. This tool would perform static analysis on a piece of python code. Instead of generating code with this information, it would mark up the python code in the text display with colors, weights, etc in order to show properties from the static analysis. This would be something like semantic highlighting, as opposed to syntax highlighting. I think it possible that this information would, if created and presented in the correct way, represent the sort of optimizations that pypy-c-jit -- a full python implementation, not a language subset -- would likely perform on the code if run. Given this sort of feedback, it would be much easier for a python coder to write code that works well with the jit: for example, moving a declaration inside a loop to avoid boxing, based on the information presented. Ideally, such a tool would perform instantaneous syntax highlighting while editing and do full parsing and analysis in the background to update the semantic highlighting as frequently as possible. Obviously, detailed static analysis will provide far more information than it would be possible to display on the code at once, so I see this gui as having several modes -- like predator vision -- that show different information from the analysis. Naturally, what those modes are will depend strongly on the details of how pypy-c-jit works internally, what sort of information can be sanely collected through static analysis, and, naturally, user testing. I was somewhat baffled at first as to how what I wrote before was interpreted as interest in a static python. I think the disconnect here is the assumption on many people's part that a static language will always be faster than a dynamic one. Given the existing tools that provide basically no feedback from the compiler / interpreter / jitter, this is inevitably true at the moment. I foresee a future, however, where better tools let us use the full power of a dynamic python AND let us tighten up our code for speed to get the full advantages of jit compilation as well. I believe that in the end, this combination will prove superior to any fully static compiler. -Terrence

On Mon, Sep 27, 2010 at 4:44 PM, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
This all looks interesting, and if you can plug that on emacs or textmate I would be really happy, but it is not what I want. I would settle for a tool that generates at runtime information about what the jit is doing in a simple text format (json, yaml or something even simpler?) and a tool to visualize this so you can optimize python programs to run on pypy easily. The biggest difference is that just collecting this info from the JIT appears to be much much easier than somehow implement a static processor for python code that do some form of analysis. I think that fijal is at least thinking about doing such a tool right? -- Leonardo Santagada

On Mon, Sep 27, 2010 at 21:58, Leonardo Santagada <santagada@gmail.com> wrote:
Have you looked at what the Azul Java VM supports for Java, in particular RTPM (Real Time Performance Monitoring)? Academic accounts are available, and from Cliff Click's presentations, it seems to be a production-quality solution for this (for Java), which could give interesting ideas. Azul business is exclusively centered around Java optimization at the JVM level, so while not-so-famous they are quite relevant. See slide 28 of: www.azulsystems.com/events/vee_2009/2009_VEE.pdf for some more details. See also wiki.jvmlangsummit.com/pdf/36_Click_fastbcs.pdf, and the account about JRuby's slowness (caused by unreliable performance analysis tools). Given that JIT can beat static compilation only through forms of profile-directed optimization, I also believe that the interesting information should be obtained through logs from the JIT. A static analyser can't do something better than a static compiler - not reliably at least. _However_, static semantic highlighting might still be interesting: while it does not help understanding profile-directed optimizations done by the JIT, it might help understanding the consequences of the execution model of the language itself, where it has a weird impact on performance. E.g., for CPython, it might be very useful simply highlighting usages of global variables, that require a dict lookup, as "bad", especially in tight loops. OTOH, that kind of optimization should be done by a JIT like PyPy, not by the programmer. I believe that CALL_LIKELY_BUILTIN and hidden classes already allow PyPy to fix the problem without changing the source code. The question then is: which kinds of constructs are unexpectedly slow in Python, even with a good JIT? Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

On Tue, 2010-09-28 at 00:52 +0200, Paolo Giarrusso wrote:
Briefly, but it's not open source, and it's a Java thing, so it didn't pique my interest significantly.
I'd be pursuing the jit logging approach much more aggressively if I cared at all about Python2 anymore. All of the source I care about analyzing is in Python3. However, considering the rate I'm going, pypy will doubtless support python3 by the time I get a half-way descent static analyzer working anyway, so it's probably worth considering.
Precisely. I'd love a good answer to that question. In addition to jitting, although it would not technically be python anymore, I see a place for something like SPUR or Jaegermonkey -- combined compilation and jitting. Naturally, the performance of such a beast over a jit alone would be dependent on how much boxing the compiler could remove. My goal for this work is about half geared towards answering that single question, just so I'll know if I should stop dreaming about python eventually having performance parity with C/C ++. I tend to think that having a solid (if never perfect) static analyzer for python could help in many areas. I had thought that helping coders help the jit out would be a good first use, but as you say, there will be problems with that. Regardless, my hope is that a library for static analysis of python will be more generally useful than my own hare-brained schemes. In any case, I'm working on this in the form of a code editor first because, regardless of what the answer to the previous question is, I know from experience that highlighting for python like what SourceInsight does for C++ will be extremely useful. Thank you for the kind feedback, your comments are much appreciated. -Terrence
Best regards

Monday 27 September 2010 you wrote:
The JIT works because it has more information at runtime than what is available at compile time. If the information was available at compile time we could do the optimizations then and not have to invoke the extra complexity required by the JIT. Examples of the extra information include things like knowing that introspection will not be used in the current evaluation of a loop, specific argument types will be used in calls and that some arguments will be known to be constant over part of the program execution.. Knowing these bits allows you to optimize away large chunks o f the code that otherwise would have been executed. Static analysis assumes that none of the above mentioned possibilities can actually take place. It is impossible to make such assumptions at compile time in a dynamic language. Therefore PyPy is a bad match for people wanting to staically compile subsets of Python. Applying the JIT to RPython code is not workable, because the JIT is optimized to remove bits of generated assembler code that never shows up in the compilation of RPython code. These are very basic first principle concepts, and it is a mystery to me why people can't work them out for themselves. Jacob Hallén

On Tue, 2010-09-28 at 01:57 +0200, Jacob Hallén wrote:
Yes, that idea is just dumb. It's also not what I suggested at all. I can see now that what I said would be easy to misinterpret, but on re-reading it, it clearly doesn't say what you think it does.
You are quite right that static analysis will be able to do little to help an optimal jit. However, I doubt that in the near term pypy's jit will cover all the dark corners of python equally well -- C has been around for 38 years and its still got room for optimization. -Terrence
Jacob Hallén

On 28 September 2010 10:43, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
It does make /some/ sense, I think. From the perspective of the JIT, operating at interp-level, the app-level python program *is the biggest part of* the "stuff you don't know about until runtime". That is, you don't know the program source at translation time, and most of the information the JIT is supposed to find are app-level constructs (eg app-level loops). Of course any such analysis will fall flat in certain cases, like eval(raw_input(...)). But you should still be able to gather enough information for most fairly hygenic code. What sort of analyses did you have in mind?
There are some undesirable things about static analysis, but it can sure be useful from optimisation, security and reliability perspectives. There's also code browsing, too; IDEs require a different (fuzzier) parser, but the question of 'what types does this object probably have' makes more sense with a little dependent region analysis. Optimising when you can be fairly confident of the types involved could be useful. That doesn't really sound like pypy at that point, though. -- William Leslie

On Tue, 2010-09-28 at 11:55 +1000, William Leslie wrote:
I think this is a disconnect. Applying a jit to a non-interpretted language -- Jacob here seems to think I was talking about a static, compiled subset of python -- makes little sense. Static analysis to provide help to an interpreter does, as you say, make some sense, and not to just me. Brett Cannon applied static type analysis to the CPython interpreter for his PHD thesis [1], looking for a speed boost by removing some typing abstraction. Unfortunately, it was not spectacularly helpful for CPython. I think for pypy-jit, however, it has much greater potential because of the possibility of full unboxing. Given past results however, it's not the first place I'd go looking for speedups. Others may have better ideas in this area than I do though.
This is one of the reasons that I had to pull together my own parsing (largely borrowed from pypy, actually) and analysis infrastructure, rather than just using pypy's off-the-shelf. Even without pypy's neat analysis code, the fact that it ditches character-level info when making an ast means you can't apply highlighting with it without groping about half-blindly in the source.
Given the choice between the status quo and an extremely slow eval, but much faster python overall, I think most people would pick the second.
What sort of analyses did you have in mind?
As this is a side project, for the moment I am focusing on simple stuff, mostly things I need/want for work. In the short term these include Python3 linting (which is almost working) and static type analysis. The second will be particularly interesting because we have (at work) annotated most of our interfaces with type data, so this will probably net much more specific and helpful data than it would in many projects. I am also, specifically, as I mentioned to Paolo yesterday, trying to find out how much of our code could be fully unboxed, given that we have extensive type contracts at our interfaces. If the answer is "most of it", then it may make sense for us to build something like Jaegermonkey for python someday.
Brendan Eich agrees [2]. This is heartening, because javascript has much in common with python. I agree too, for that matter, but that's probably a lot less heartening :-).
There's also code browsing, too; IDEs require a different (fuzzier) parser,
Reason number two that I have to maintain a separate parser/analyzer.
Given that I want to work with Python3 anyway (and that I'd never be able to beat pypy's performance before it supports Python3), I'm focusing mostly on a tool to help make reliable and correct code. However, performance is always in the back of my mind these days. It seems from this thread that I won't be able to do much in that regard with my current approach, unfortunately. Maybe by the time I can focus on it, pypy will support python3 and I can work on providing real-time jit feedback. -Terrence [1] http://www.ocf.berkeley.edu/~bac/thesis.pdf [2] http://brendaneich.com/2010/08/static-analysis-ftw/

On Tue, 2010-09-28 at 15:20 +0200, Maciej Fijalkowski wrote:
Lots. They're almost all probably wrong though, so be warned :-). I'm also not entirely clear on what you mean, so let me tell you what I have in mind and you can tell me if I'm way off base. I assume workflow would go like this: 1) run pypy on a bunch of code in profiling mode, 2) pypy spits out lots of data about what happened in the jit when the program exits, 3) start up external analysis program pointing it at this data, 4) browse the python source with the data from the jit overlayed as color, formatting, etc on top of the source. Potentially there would be several separate modes for viewing different aspects of the jit info. This could also include the ability to select different program elements (loops, variables, functions, etc) and get detailed information about their runtime usage in a side-pane. Ideally, this workflow would be taken care of automatically by pushing the run button in your IDE. As a more specific example of what the gui would do in, for instance, escape analysis mode: display local variables that do not escape any loops in green, others in red. Hovering over a red variable would show information about how, why, and where it escapes the loop in a tooltip or bubble. Selecting a red variable show the same info in a pane and would draw arrows on the source showing where it escapes from a loop/function etc. In my ideal world, this profiling data analysis would sit side-by-side with various display modes that show useful static analysis feedback, all inside a full-fledged python IDE. This is all, of course, a long way off still. What I'm working on right now is basic linting for python3 so that I can add a lint step to our hudson server and start to get some graphs up. What I _really_ would like to work on, if I had the time, is making pypy support Python3 so that I could use it at work. However, I think I'd mostly just get in the way if I tried that, given my other time commitments. I hope there was something helpful in that brain-dump, but I suspect I may be way off target at this point. -Terrence

Hi Terrence, hi all On 28/09/10 22:33, Terrence Cole wrote:
You can already to it (partially) by using the PYPYLOG environment variable like this: PYPYLOG=jit-log-opt:mylog ./pypy -m test.pystone then, mylog contains all the loops and bridges produced by the jit. The interesting point is that there are also special operations called "debug_merge_point" that are emitted for each python bytecode, so you can easily map the low-level jit instructions back to the original python source. E.g., take lines 214 of pystone: Array1Par[IntLoc+30] = IntLoc The corresponding python bytecode is this: 214 38 LOAD_FAST 4 (IntLoc) 41 LOAD_FAST 0 (Array1Par) 44 LOAD_FAST 4 (IntLoc) 47 LOAD_CONST 3 (30) 50 BINARY_ADD 51 STORE_SUBSCR By searching in the logs, you find the following (I edited it a bit to improve readability): debug_merge_point('<code object Proc8, line 208> #38 LOAD_FAST') debug_merge_point('<code object Proc8, line 208> #41 LOAD_FAST') debug_merge_point('<code object Proc8, line 208> #44 LOAD_FAST') debug_merge_point('<code object Proc8, line 208> #47 LOAD_CONST') debug_merge_point('<code object Proc8, line 208> #50 BINARY_ADD') debug_merge_point('<code object Proc8, line 208> #51 STORE_SUBSCR') p345 = new_with_vtable(ConstClass(W_IntObject)) setfield_gc(p345, 8, descr=<SignedFieldDescr pypy.objspace.std.intobject.W_IntObject.inst_intval 8>) call(ConstClass(ll_setitem__dum_checkidxConst_listPtr_Signed_objectPtr), p333, 38, p345, descr=<VoidCallDescr>) guard_no_exception(, descr=<Guard146>) [p1, p0, p71, p345, p312, p3, p4, p6, p308, p315, p335, p12, p13, p14, p15, p16, p18, p19, p178, p26, p320, p328, i124, p25, i329] Here, you can see that most opcodes are "empty" (i.e., no operations between one debug_merge_point and the next). In general, all the opcodes that manipulate the python stack are optimized away by the jit, because all the python variables on the stack become "local variables" in the assembler. Moreover, you can see that BINARY_ADD is also empty: this probably means that the loop was specialized for the specific value of IntLoc, so the addition has been constant-folded away. Indeed, the only opcode that do real work is STORE_SUBSCR. What it does it to allocate a new W_IntObject whose value is 8 (i.e., boxing IntLoc on the fly, because it's escaping), and store it into the element 38 of the list stored in p333. Finally, we check that no exception was raised. Obviously, when presenting these information to the user you must consider that there is not a 1-to-1 mapping from python source to jit loops. In the example above, the very same opcodes are compiled also in another loop (which by chance it has the same jit-operations, but they might also be very different, depending on the cases). As you can see, there is already lot of information that can be useful to the user. However, don't ask me how to present it visually :-) ciao, anto

On Wed, 2010-09-29 at 11:37 +0200, Antonio Cuni wrote:
I think that 'easily' in that last sentence is missing scare-quotes. :-)
Wow, thank you for the awesome explanation. I think the only surprising thing in there is that I actually understood all of that.
Currently, in my hacked together parsing chain, the low-level parser keeps a reference to the underlying token when it creates a new node and subsequently the ast builder keeps a references to the low-level parse node when it creates an ast node. This way, I can easily map down to the individual source chars and full context when walking the AST to do highlighting, linting, etc. My first inclination would be to continue this chain and add a bytecode compiler on top of the ast builder. This would keep ast node references in the instructions it creates. If the algorithms don't diverge too much, I think this would allow the debug output to be mapped all the way back to the source chars with minimal effort. I'm not terrifically familiar with the specifics of how python emits bytecode from an ast, so I'd appreciate any feedback if you think this is crazy-talk.
As you can see, there is already lot of information that can be useful to the user. However, don't ask me how to present it visually :-)
Neither do I, but finding out is going to be the fun part.
ciao, anto
I'm excited to try some of this out, but unfortunately, there is an annoying problem in the way. All of my work in the last year has been on python3. Having worked in python3 for awhile now, my opinion is that it's just a much better language -- surprisingly so, considering how little it changed from python2. If pypy supported python3, then I could maintain my parser as a diff against pypy (you are switching to Mercurial at some point, right?), which would make it much easier to avoid divergence. So what I'm getting at is: what is the pypy story for python3 support? I haven't seen anything on pypy-dev, or in my occasional looks at the repository, to suggest that it is being worked on but I'm sure you have a plan of some sort. I'm willing to help out with python3 support, if I can do so without getting in anyone's way. It seems like the sort of thing that will be disruptive, however, so I have been leery of jumping in, considering how little time I have to contribute, at the moment. In my mind, the python3 picture is something like: At the compilation level, it's easy enough to dump Grammar3.2 in pypy/interpreter/pyparser/data and to modify astbuilder for python3 -- I'll backport the changes I made, if you want. Interpreter support for the new language features will be harder, but that's probably already done since 2.7 is almost working. The only potential problems I see are the big string/unicode switch and the management of the fairly large changes to astbuilder -- I'm sure you want to continue supporting python2 into the future. I don't know how much the bytecode changed between 2 and 3, so I'm not sure if there are jit issues to worry about. Am I missing anything big? -Terrence

Hi Terence, all, On Wed, Sep 29, 2010 at 13:40 -0700, Terrence Cole wrote:
In fact, there has been work from Benjamin Peterson and is some work from Amaury and Alex to complete the http://codespeak.net/svn/pypy/branch/fast-forward/ branch. It aims at offering Python2.7 compatibility. This is a good intermediate step to jump to Python3 at some point. Most PyPy core devs are focusing on JIT related tasks so this is a good place to help out in general. If you like to help you can drop by at #pypy on freenode and/or maybe some of the involved persons can point to some tasks here. cheers, holger
--

Hi Maciej, On Thu, Sep 30, 2010 at 10:51 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Other big issues are about RPython itself. Do we want RPython to be python3 compatible? How?
No, I'm pretty sure that even if we want to support python3 at some point, RPython will remain what it is now, and translate.py will remain a python2 tool. Armin

On 29/09/10 22:40, Terrence Cole wrote:
well, it's easy as long as you have a bytecode-compiled version around. With only the AST I agree that it might be a bit trickier. [cut]
Are you using your custom-made AST or the one from the standard library? In the latter case, you can just pass the ast to the compile() builtin function to get the corresponding bytecode.

You should seriously read and try to understand the e-mails that you reply to, instead of top-posting them away.
Stefan, there are different ways to argue the same valid thing, and the way you chose is IMHO counterproductive for you - the only result is offensive comments. Also, while I seldom top-post, especially in public forums/MLs, IIRC several PyPy contributors routinely top-post, and I see some sensible arguments (see http://en.wikipedia.org/wiki/Posting_style#Top-posting). Saravanan, a small part of the issue is that many people consider top posting inappropriate and/or lame (for instance http://www.caliburn.nl/topposting.html). Be aware of that risk if you top-post. And please, never claim it increases readability - it makes your post only readable if you read the whole thread. Non-crappy email clients highlight differently the new text from the quoted text, making the former easy to find. In particular, by top-posting you never address the comments which explain why merging does not necessarily make sense (like some of mine), or the ones which argue it's a bad idea (like last Amaury's mail). Interleaved replying brings instead to point-by-point answers. See other comments below. On Fri, Sep 3, 2010 at 11:30, Stefan Behnel <stefan_ml@behnel.de> wrote:
Which then begs the question, would it make sense for PyPy to adopt Shedskin to compile its PyPy RPython code into C++/binary.
The answer is already implicit in one of my previous emails, and is a very clear "no, unless considerable extra merging effort is done, which might be more than the effort to make the RPython compiler better than Shedskin". I paste a relevant subset of that mail at the end; while I can believe that you have read it, I often do not understand all the implications of what I read the first time, if that's complex, like it is for everybody, so do not be offended if I suggest you to re-read it again. A similar, more detailed argument is discussed by Amaury Forgeot d'Arc in an email where he replies to you. In other mails, you write that:
I just don't see any logical reasons [against the merger], thats all. And I haven't heard any on this thread either.
no one has necessarily claimed logical impossibility on why this may not work.
which strikes me as _wrong_. The mails I mentioned explain why there are different design goals - Amaury, who knows more about Shedskin than me, explained why it is less general. That's already an answer for me. Of course, this does not prove impossibility, it only suggests that it may be not a good idea to merge the projects. You shouldn't care about logical impossibility, which makes _NO SENSE_ in such questions in software engineering; what is possible and bad makes little sense. If you meant "nobody claimed that this is necessarily a bad idea", then I agree. We believe there's no obvious way to combine the projects; anybody, including you, is welcome to address the specific issues and find some clever solution. You didn't even scratch them, yet. And while you claimed experience with using the projects, or reading their documentation, it is not clear at all that you understand their internals, and this is required to address these problems. The only idea which makes some sense is that instead of starting the development of Shedskin, the author could have tried achieving the same results improving RPython, fixing its error messages and so on. However, I can imagine a ton of possible reasons for which he might have consciously decided to do something else. The keyword here is "design tradeoff": a design choice can make a product better in some respects and worse in other ones. Shedskin is less flexible, but possibly this gives technical advantages which are important. That's the same thing as explained below. Best regards ===== => Do different goals cause _incompatible_ design/implementation choices? Currently, static typing versus global type inference seems to be already a fundamental difference. Modular type inference, if polymorphic (and I guess it has to be), would require using boxed or tagged integers more often, as far as I can see. RPython is intended to be compiled to various environments, with different variations (choose GC/stackful or stackless/heaps of other choices), and its programmers are somewhat OK with its limitations; it has type inference, with its set of tradeoffs. This for instance prevents reusing shedskin, and probably even prevents reusing any of its code. -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

----- Original Message ---- From: Paolo Giarrusso <p.giarrusso@gmail.com> To: Stefan Behnel <stefan_ml@behnel.de>; Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Fri, September 3, 2010 4:34:24 PM Subject: Re: [pypy-dev] Question on the future of RPython
You should seriously read and try to understand the e-mails that you reply to, instead of top-posting them away.
Stefan, there are different ways to argue the same valid thing, and the way you chose is IMHO counterproductive for you - the only result is offensive comments. Also, while I seldom top-post, especially in public forums/MLs, IIRC several PyPy contributors routinely top-post, and I see some sensible arguments (see http://en.wikipedia.org/wiki/Posting_style#Top-posting). Saravanan, a small part of the issue is that many people consider top posting inappropriate and/or lame (for instance http://www.caliburn.nl/topposting.html). Be aware of that risk if you top-post. And please, never claim it increases readability - it makes your post only readable if you read the whole thread. Non-crappy email clients highlight differently the new text from the quoted text, making the former easy to find. Sarvi>> Point taken. will keep that in mind. It was misguided notion of what would be readable. Sarvi In particular, by top-posting you never address the comments which explain why merging does not necessarily make sense (like some of mine), or the ones which argue it's a bad idea (like last Amaury's mail). Interleaved replying brings instead to point-by-point answers. See other comments below. On Fri, Sep 3, 2010 at 11:30, Stefan Behnel <stefan_ml@behnel.de> wrote:
The answer is already implicit in one of my previous emails, and is a very clear "no, unless considerable extra merging effort is done, which might be more than the effort to make the RPython compiler better than Shedskin". I paste a relevant subset of that mail at the end; while I can believe that you have read it, I often do not understand all the implications of what I read the first time, if that's complex, like it is for everybody, so do not be offended if I suggest you to re-read it again. A similar, more detailed argument is discussed by Amaury Forgeot d'Arc in an email where he replies to you. In other mails, you write that:
no one has necessarily claimed logical impossibility on why this may not work.
which strikes me as _wrong_. The mails I mentioned explain why there are different design goals - Amaury, who knows more about Shedskin than me, explained why it is less general. That's already an answer for me. Of course, this does not prove impossibility, it only suggests that it may be not a good idea to merge the projects. You shouldn't care about logical impossibility, which makes _NO SENSE_ in such questions in software engineering; what is possible and bad makes little sense. If you meant "nobody claimed that this is necessarily a bad idea", then I agree. We believe there's no obvious way to combine the projects; anybody, including you, is welcome to address the specific issues and find some clever solution. You didn't even scratch them, yet. And while you claimed experience with using the projects, or reading their documentation, it is not clear at all that you understand their internals, and this is required to address these problems. The only idea which makes some sense is that instead of starting the development of Shedskin, the author could have tried achieving the same results improving RPython, fixing its error messages and so on. However, I can imagine a ton of possible reasons for which he might have consciously decided to do something else. The keyword here is "design tradeoff": a design choice can make a product better in some respects and worse in other ones. Shedskin is less flexible, but possibly this gives technical advantages which are important. That's the same thing as explained below. Best regards ===== => Do different goals cause _incompatible_ design/implementation choices? Currently, static typing versus global type inference seems to be already a fundamental difference. Modular type inference, if polymorphic (and I guess it has to be), would require using boxed or tagged integers more often, as far as I can see. RPython is intended to be compiled to various environments, with different variations (choose GC/stackful or stackless/heaps of other choices), and its programmers are somewhat OK with its limitations; it has type inference, with its set of tradeoffs. This for instance prevents reusing shedskin, and probably even prevents reusing any of its code. -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Hi, 2010/9/3 Saravanan Shanmugham <sarvi@yahoo.com>
But PyPy does not translate RPython code to C++. Or before doing so, it performs transformations to the code that require the analysis of the program as a whole and that a C++ compiler cannot do, like the choice of a garbage collector, the stackless mode, and most of all the generation of a tracing JIT. It also operates on the bytecode, which offers interesting metaprogramming techniques that are used throughout the code (similar to C++ templates, for example, except that it's written in Python :-) ) Shedskin on the other hand performs a more direct translation of Python code (it uses the ast) Both projects don't have the same goals. -- Amaury Forgeot d'Arc

No other python implementation can convert python programs to executables.
There's shedskin, which is actually very good as these things go: http://code.google.com/p/shedskin/ Like RPython, you have to write in a small subset of python which can be a little frustrating once you've gotten used to pythonic freedom. But I've found it very useful for some short numerical codes (putting on my OEIS associate editor hat). And Cython is pretty powerful these days. ObPyPy: the other day I had cause to run a very short, unoptimized, mostly integer-arithmetic code. With shedskin, it took between ~42s (with ints) and ~1m43 (with longs), as compared with only ~3m30 or so to run under pypy. That's only a factor of two (if I'd needed longs). Both could be much improved, and a lower-level version in C would beat them both, but I was very impressed by how little difference there was. Major props! For numerics it'd be interesting to have a JIT option which didn't care about compilation times, and instead of generating assembly itself generated assembly-like C which was then delegated to an external compiler. Doug -- Department of Earth Sciences University of Hong Kong

On Thu, Sep 2, 2010 at 08:27, Douglas McNeil <mcneil@hku.hk> wrote:
A more interesting road (which is mentioned somewhere in the PyPy blog) is to use LLVM in place of this "external JIT compiler", so that you generate "assembly-like LLVM Intermediate Representation". A bit like UnladenSwallow is doing, with the difference of having a saner runtime model to start with (say, no reference counting). Once you start with LLVM, you are free to choose which optimization passes to run, from very little to -O3 to even more ones. The other C compilers incur huge startup costs for no good, and don't usually allow being used as a library, if just for engineering problems. LLVM is so much cooler anyway, especially now that say _everybody_ is switching to it. About the compilation times tradeoff, you can look for "tiered compilation", which is a general strategy for doing it automatically, possibly allowing different tunings (say, like java -server, which is tuned for performance rather than responsiveness). My authoritative reference is Cliff Click's blog [1], but you probably want to stop reading it after the introduction, as I did in this case. [1] http://www.azulsystems.com/blog/cliff-click/2010-07-16-tiered-compilation -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

If PyPy is using RPython for its compiler implementation, it should and will be optimized eventually for its compiler/JIT to to be fast. Which just tells me that the performance gap between Shedskin and PyPy will be narrowed/beat pretty soon. I would still rather work with just one interpreter/compiler, say PyPy. Better than using PyPy/CPython for an interpreter and Cython/Shedkin for compiling, without interpreter support during development. I am just seeing Cython/Shedskin as fragmentation of resources. A lot more could accomplished if these projects came together with PyPy. If you ask this question on the Shedskin/Cython alias as to why they shouldn't pool resources into making the PyPy RPython compiler into a First class citizen/goal of PyPy. They will immediately tell you its not a goal PyPy. Why not officially make it so. Formalize RPython and its compiler. Obviate the need for Cython/Shedskin and get them on board. Like the example I quoated. Mercurial is 95% python 5% C for peformance. It should be 95% python and 5% RPython. We have Pickle and cPickle for performance. The Pickle could have simply been rewritten in RPython and probably compiled and we don't need different versons :-)) Sarvi ----- Original Message ---- From: Douglas McNeil <mcneil@hku.hk> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: "pypy-dev@codespeak.net" <pypy-dev@codespeak.net> Sent: Wed, September 1, 2010 11:27:32 PM Subject: Re: [pypy-dev] Question on the future of RPython
No other python implementation can convert python programs to executables.
There's shedskin, which is actually very good as these things go: http://code.google.com/p/shedskin/ Like RPython, you have to write in a small subset of python which can be a little frustrating once you've gotten used to pythonic freedom. But I've found it very useful for some short numerical codes (putting on my OEIS associate editor hat). And Cython is pretty powerful these days. ObPyPy: the other day I had cause to run a very short, unoptimized, mostly integer-arithmetic code. With shedskin, it took between ~42s (with ints) and ~1m43 (with longs), as compared with only ~3m30 or so to run under pypy. That's only a factor of two (if I'd needed longs). Both could be much improved, and a lower-level version in C would beat them both, but I was very impressed by how little difference there was. Major props! For numerics it'd be interesting to have a JIT option which didn't care about compilation times, and instead of generating assembly itself generated assembly-like C which was then delegated to an external compiler. Doug -- Department of Earth Sciences University of Hong Kong

awesome. The point I was making is that RPython(a static subset of Python) will be faster than Dynamic Python code on a JIT or compiled to machine code. Sarvi ----- Original Message ---- From: Amaury Forgeot d'Arc <amauryfa@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: Douglas McNeil <mcneil@hku.hk>; "pypy-dev@codespeak.net" <pypy-dev@codespeak.net> Sent: Thu, September 2, 2010 1:28:14 AM Subject: Re: [pypy-dev] Question on the future of RPython Hi, 2010/9/2 Saravanan Shanmugham <sarvi@yahoo.com>:
The PyPy way is much simpler: there is only the original pickle.py, written in plain full Python, and it's as fast as a C or RPython implementation. -- Amaury Forgeot d'Arc

But what makes you think that? A dynamic compiler has more information, so it should be able to produce better code. On 02/09/2010 6:37 PM, "Saravanan Shanmugham" <sarvi@yahoo.com> wrote: awesome. The point I was making is that RPython(a static subset of Python) will be faster than Dynamic Python code on a JIT or compiled to machine code. Sarvi ----- Original Message ---- From: Amaury Forgeot d'Arc <amauryfa@gmail.com> To: Saravanan Shanmugh... Sent: Thu, September 2, 2010 1:28:14 AM Subject: Re: [pypy-dev] Question on the future of RPython Hi, 2010/9/2 Saravanan Shanmugham <sarvi@yahoo.com>:
We have Pickle and cPickle for performance. ...

On Thu, Sep 2, 2010 at 10:40, William Leslie <william.leslie.ttg@gmail.com>wrote:
But what makes you think that? A dynamic compiler has more information, so it should be able to produce better code.
Note that he's not arguing about a static compiler for the same code, which has no type information, and where you are obviously right. He's arguing about a statically typed language, where the type information is already there in the source, e.g. C - there is much less information missing. Actually, your point can still be made, but it becomes much less obvious. For this case, it's much more contended what's best - see the "java faster than C" debate. Nobody has yet given a proof convincing enough to close the debate. I would say that there's a tradeoff between JIT and Ahead-Of-Time compilation, when AOT makes sense (not in Python, SmallTalk, Self...). On 02/09/2010 6:37 PM, "Saravanan Shanmugham" <sarvi@yahoo.com> wrote:
Run-time specialization would allow exactly the same code to be generated, without any extra guards in the inner loop. Java can do that at times, and can even be better than C, but not always (see above). You'd need a static compiler with Profile-Guided Optimization and have a profile which matches runtime, to guarantee superior results. Cheers -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

So far as I can tell from Unladed Swallow and PyPy, it is some of these Dynamic features of Python, such as Dynamic Typing that make it hard to compile/optimize and hit C like speeds. Hence the need for RPython in PyPy or Restricted Python in Shedskin? Sarvi ________________________________ From: William Leslie <william.leslie.ttg@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: "pypy-dev@codespeak.net" <pypy-dev@codespeak.net>; Amaury Forgeot d'Arc <amauryfa@gmail.com> Sent: Thu, September 2, 2010 1:40:03 AM Subject: Re: [pypy-dev] Question on the future of RPython But what makes you think that? A dynamic compiler has more information, so it should be able to produce better code. On 02/09/2010 6:37 PM, "Saravanan Shanmugham" <sarvi@yahoo.com> wrote:
awesome.
The point I was making is that RPython(a static subset of Python) will be
faster

On 2 September 2010 19:00, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Hence the need for the JIT, not rpython. Rpython is an implementation detail, to support translation easily to C as well as CLI and JVM bytecode, and to support translation aspects such as stackless, and testing on top of a full python environment. Rewriting things in rpython for performance is a hack that should stop happening as the JIT matures. Dynamic typing means you need to do more work to produce code of the same performance, but it's not impossible. On 2 September 2010 18:56, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
Sure - having static type guarantees is another case of "more information". There is a little more room for discussion here, because there are cases where a dynamic compiler for a safe runtime can do better at considering certain optimisations, too. We have been talking about our stock-standard type systems here, which ensure that our object will have the field or method that we are interested in at runtime, and perhaps (as long as it isn't an interface method, which we don't have in rpython anyway) the offset into the instance or vtable respectively. That makes for a pretty decent optimisation, but type systems can infer much more than this, including which objects may escape (via region typing a-la cyclone), which fields may be None, and which instructions are loop invariant. The point is that some of these type systems work fine with separate compilation, and some do significantly better with runtime or linktime specialisation. On 2 September 2010 17:56, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
Rtyping is whole-program.
Functional languages allow separate compilation - is there any RPython-specific problem for that? I've omitted my guesses here.
Many do, yes. To use ML derivatives as an example, you require the signature of any modules you directly import. I was recently reading about MLKit's module system, which is quite interesting (it has region typing, and the way it unifies these types at the module boundary is cute - carrying region information around in the source text is fragile, so must be inferred). Haskell is kind of a special case, requiring dictionaries to be passed around at runtime to determine which method of some typeclass to call. For OCaml (most MLs are similar) see section 2.5: "Modules and separate compilation" of http://pauillac.inria.fr/ocaml/htmlman/manual004.html On MLKit's module implementation and region inference: http://www.itu.dk/research/mlkit/index.php/Static_Interpretation -- William Leslie

On 2 September 2010 15:54, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Because then we would have to support that general use. Python benefits from being reasonably standardised, you can be sure that most python you write will run on any implementation that supports the version you are targetting. On the other hand, if you are mangling cpython or pypy bytecode, you are asking for trouble. Rpython is an example of such an implementation detail - we* might like to change features of it here or there to better support some needed pattern. Introducing yet another incompatable and complicated language to the python ecosystem is not a worthwhile goal in itself. * Just my opinion. Others might feel like standardising some amount of rpython is a worthwhile idea.
I can't see why you would ever want to do this - if you use py2exe or the like instead, you get a large standard library and a great language to work in, neither of which you get if you use rpython.
I am seeing growing interest in writing Rpython code for performance critical code and even potentially compiling it to binaries.
The intention is to get almost the same performance out of the JIT. For those that actually care about the last few percent, it would be nicer to provide hints to generate specialised code at module compile time, that way you can still work at python level.
Is it possible the PyPy team may be understating the significance of RPython? Am I crazy to think this way? :-)
Supporting better integration between app-level python and other languages that interact with interpreter level would be nice. CLI integration is good, and JVM integration is lagging just a little. But once you can interact with that level, there are much saner languages that you could use for your low-level code than rpython - languages /designed/ to be general purpose languages. At the moment, the lack of separate compilation is a real issue standing in the way of using rpython as a general purpose language, or even as an extension language. Having to re-translate *everything* every time you want to install an extension module is not on. Even C doesn't require that. The other is that type inference is global and changes you make to one function can have far-reaching consequences. The error messages when you do screw up aren't very friendly either. If you want a low-level general purpose language with type inference and garbage collection that has implementations for every platform pypy targets, there are already plenty of options. -- William Leslie

On Thu, Sep 2, 2010 at 9:56 AM, Paolo Giarrusso <p.giarrusso@gmail.com> wrote:
There is no notion of a "module" in RPython. RPython is compiled from live python objects (hence python is a metaprogramming language for RPython). There is a bunch of technical problems, but it's generally possible to implement separate compilation (it's work though). Cheers, fijal

I afraid people are missing the point here. For an average engineer its better to be an expert of 1 language than be an average at 4. Thats my take on things. Take Merurial(an SCM) 95% python 5%C and gives GIT a run for its money This could be 95%Python and 5%RPython.
From my reading on PyPy, thats why yall chose to write the PyPy in RPython. Yall could have done in this C/C++ right?
So far as I can tell RPython is a strict subset of Python. I don't see why it shouldn't continue to be. And even if yall needed to make a very small set of static extension to RPython, you wouldn't any worse that Cython and Shedskin. I would still rather work with just one interpreter/compiler, say PyPy. Better than PyPy/CPython for interpreter and Cython/Shedkin for compiling, with interpreter support during development. I am just seeing Cython/Shedskin as fragmentation of resources. A lot more could accomplished if these projects came together. Sarvi ----- Original Message ---- From: William Leslie <william.leslie.ttg@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 12:09:03 AM Subject: Re: [pypy-dev] Question on the future of RPython On 2 September 2010 15:54, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Because then we would have to support that general use. Python benefits from being reasonably standardised, you can be sure that most python you write will run on any implementation that supports the version you are targetting. On the other hand, if you are mangling cpython or pypy bytecode, you are asking for trouble. Rpython is an example of such an implementation detail - we* might like to change features of it here or there to better support some needed pattern. Introducing yet another incompatable and complicated language to the python ecosystem is not a worthwhile goal in itself. * Just my opinion. Others might feel like standardising some amount of rpython is a worthwhile idea.
I can't see why you would ever want to do this - if you use py2exe or the like instead, you get a large standard library and a great language to work in, neither of which you get if you use rpython.
I am seeing growing interest in writing Rpython code for performance critical code and even potentially compiling it to binaries.
The intention is to get almost the same performance out of the JIT. For those that actually care about the last few percent, it would be nicer to provide hints to generate specialised code at module compile time, that way you can still work at python level.
Is it possible the PyPy team may be understating the significance of RPython? Am I crazy to think this way? :-)
Supporting better integration between app-level python and other languages that interact with interpreter level would be nice. CLI integration is good, and JVM integration is lagging just a little. But once you can interact with that level, there are much saner languages that you could use for your low-level code than rpython - languages /designed/ to be general purpose languages. At the moment, the lack of separate compilation is a real issue standing in the way of using rpython as a general purpose language, or even as an extension language. Having to re-translate *everything* every time you want to install an extension module is not on. Even C doesn't require that. The other is that type inference is global and changes you make to one function can have far-reaching consequences. The error messages when you do screw up aren't very friendly either. If you want a low-level general purpose language with type inference and garbage collection that has implementations for every platform pypy targets, there are already plenty of options. -- William Leslie

Saravanan Shanmugham, 02.09.2010 09:57:
Well, it's certainly better to be an almost-expert in two, than a no-left-no-right expert in only one.
I am just seeing Cython/Shedskin as fragmentation of resources.
You might want to closer look at the projects and their goals before judging that way. Stefan

I have researched these projects quite extensively. Quite similar beasts as far as I can tell. Cython/Pyrex used to write python extensions. They use statically typed variants of Python which gets compiled into C which can then be compiled. Shedskin is slightly more general purpose Restricted Python to C++ compiler. PyPy as I understand can convert RPython into C code Am I missing something here? Sarvi ----- Original Message ---- From: Stefan Behnel <stefan_ml@behnel.de> To: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 1:11:10 AM Subject: Re: [pypy-dev] Question on the future of RPython Saravanan Shanmugham, 02.09.2010 09:57:
Well, it's certainly better to be an almost-expert in two, than a no-left-no-right expert in only one.
I am just seeing Cython/Shedskin as fragmentation of resources.
You might want to closer look at the projects and their goals before judging that way. Stefan _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

On Thu, Sep 2, 2010 at 10:18, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
Maybe you have done extensive research, but the above is not enough for the conclusion, which might still be valid. There could be some cool way to reuse each other's code, and that would be cool given the available manpower. The question is: => Do different goals cause _incompatible_ design/implementation choices? Currently, static typing versus global type inference seems to be already a fundamental difference. Modular type inference, if polymorphic (and I guess it has to be), would require using boxed or tagged integers more often, as far as I can see. RPython is intended to be compiled to various environments, with different variations (choose GC/stackful or stackless/heaps of other choices), and its programmers are somewhat OK with its limitations; it has type inference, with its set of tradeoffs. This for instance prevents reusing shedskin, and probably even prevents reusing any of its code. Cython/Shedskin are intended to be used by more people and to be simpler. ==> Would making RPython usable for people harm its usability for PyPy? I see no trivial answer to the above questions which allows merging, but I don't develop any of them. However, a discussion of this could probably end in a PyPy FAQ. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Thursday 02 September 2010 you wrote:
RPython was tried in a production environment some years ago and while it produced some very nice results, it was quite difficult to work with. Dealing with those difficulties requires a group of people who are willing to build RPython code for general applications, run the code and identify what the difficulties actually are. Then they need to come up with strategies for how to remedy the problems and implement them in code. This is a very large undertaking for which Pypy does not have the manpower.. It also reqires people who are interested in building support for compiled programming languages. Pypy is a volunteer effort and the only person who was interested in this has retired from the project. Jacob Hallén

one response inline ----- Original Message ---- From: Jacob Hallén <jacob@openend.se> To: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 2:40:45 AM Subject: Re: [pypy-dev] Question on the future of RPython Thursday 02 September 2010 you wrote:
RPython was tried in a production environment some years ago and while it produced some very nice results, it was quite difficult to work with. Dealing with those difficulties requires a group of people who are willing to build RPython code for general applications, run the code and identify what the difficulties actually are. Then they need to come up with strategies for how to remedy the problems and implement them in code. This is a very large undertaking for which Pypy does not have the manpower.. It also reqires people who are interested in building support for compiled programming languages. Pypy is a volunteer effort and the only person who was interested in this has retired from the project. Sarvi>>>> This makes sense. But wouldn't the answer to this problem be to invite people like the Shedskin/Cython developers to join forces with PyPy? So that they can pursue the general RPython usecase you mention above while the others focus on JIT and stuff on a common code base? Wouldn't that be a win-win for everybody? This collaboration feels so obvious to me, that I am confused why it isn't to others. Considering that Shedskin's goals feel almost like a strict subset of PyPy. Sarvi Jacob Hallén

On Thu, Sep 2, 2010 at 2:18 PM, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
I think what you don't get is how open source works, there is always ten projects doing almost the same thing. Everyone at least once thought "why does linux has this many media players/text editors/flash implementations/jvm when all we need is a really good one with lots of support". I does get me depressed sometimes, but this is the way it is. Cython has a big user base that they have to support and lots of programs that are in production today, shedskin is looking for pure performance and the pypy guys want to have a faster python. Although I also think that maybe RPython and the pypy python interpreter could solve all this problems someday it doesn't do so right now. I have used RPython in the past and the error messages alone would drive some people away. Some group of people could work to fix this, but I doubt it will happen soon. What I think could be done to make pypy more visible to people would be to have a killer app running on pypy way faster/better than on cpython. For me this app is either mercurial or django. -- Leonardo Santagada

----- Original Message ---- From: Leonardo Santagada <santagada@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: Jacob Hallén <jacob@openend.se>; pypy-dev@codespeak.net Sent: Thu, September 2, 2010 11:18:32 AM Subject: Re: [pypy-dev] Question on the future of RPython On Thu, Sep 2, 2010 at 2:18 PM, Saravanan Shanmugham <sarvi@yahoo.com> wrote: the
I think what you don't get is how open source works, there is always ten projects doing almost the same thing. Everyone at least once thought "why does linux has this many media players/text editors/flash implementations/jvm when all we need is a really good one with lots of support". I does get me depressed sometimes, but this is the way it is. Cython has a big user base that they have to support and lots of programs that are in production today, shedskin is looking for pure performance and the pypy guys want to have a faster python. Although I also think that maybe RPython and the pypy python interpreter could solve all this problems someday it doesn't do so right now. I have used RPython in the past and the error messages alone would drive some people away. Some group of people could work to fix this, but I doubt it will happen soon. Sarvi>> Yeah having this thread of conversation on 4 separate aliases, Python.org, Unladen Swallow, Shedskn and PyPy was just my attempt at seeing if these can come together. Oh well. What I think could be done to make pypy more visible to people would be to have a killer app running on pypy way faster/better than on cpython. For me this app is either mercurial or django. Sarvi>>> Very True. Sarvi -- Leonardo Santagada

Thursday 02 September 2010 you wrote:
It is a matter of personal pride, I think. If we made the invitation to the Shedskin people they would see this as "Pypy thinks they are way cooler than us, so they invite us to be part of their project". This would naturally generate a refusal, because even though we don't make such value statements, it would be viewed that way. So, we don't make such invitations, even if they make sense. What we hope is that some people examine the Pypy project and find that it actually is a really cool piece of technology with lots of possible side projects and expansion possibilities. If they decide to join the project we will give them all the help we are capable of. Most people have actually joined Pypy in this way. The most recent example is Håkan Ardö who wanted to expand Pypy in the direction of numeric calculations. The learning curve is fairly steep, but there are quite a few people on the IRC channel who are ready to help you overcome the hurdles. Jacob Hallén

I have heard repeatedly in this alias that PyPy's RPython is very difficult to use. I have also heard here and elsewhere that Shedskin fast and is great for what it does i.e. translate its version of Restricted Python to C++. Which then begs the question, would it make sense for PyPy to adopt Shedskin to compile its PyPy RPython code into C++/binary. Sarvi ----- Original Message ---- From: Jacob Hallén <jacob@openend.se> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Thu, September 2, 2010 1:54:01 PM Subject: Re: [pypy-dev] Question on the future of RPython Thursday 02 September 2010 you wrote:
It is a matter of personal pride, I think. If we made the invitation to the Shedskin people they would see this as "Pypy thinks they are way cooler than us, so they invite us to be part of their project". This would naturally generate a refusal, because even though we don't make such value statements, it would be viewed that way. So, we don't make such invitations, even if they make sense. What we hope is that some people examine the Pypy project and find that it actually is a really cool piece of technology with lots of possible side projects and expansion possibilities. If they decide to join the project we will give them all the help we are capable of. Most people have actually joined Pypy in this way. The most recent example is Håkan Ardö who wanted to expand Pypy in the direction of numeric calculations. The learning curve is fairly steep, but there are quite a few people on the IRC channel who are ready to help you overcome the hurdles. Jacob Hallén

Lets not be a little presumptious shall we. This is the second time you seem to be claiming that I haven't done my research/reading. I have been following the progress of PyPy over 2 years. Its great work. So is Shedskin. Just for the record, I have used PyRex, Cython. And have read the documentation and/or sample code for both Shedskin and PyPy If people say that there are emotional and pragmatic reasons for the 2 projects not coming together. That makes sense. I just don't see any logical reasons, thats all. And I haven't heard any on this thread either. BTW, Just because I top post for "readability" doesn't mean I haven't read all the threads in detail. Sarvi ----- Original Message ---- From: Stefan Behnel <stefan_ml@behnel.de> To: pypy-dev@codespeak.net Sent: Fri, September 3, 2010 2:30:32 AM Subject: Re: [pypy-dev] Question on the future of RPython Saravanan Shanmugham, 03.09.2010 11:11:
You should seriously read and try to understand the e-mails that you reply to, instead of top-posting them away. Stefan _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Saravanan Shanmugham, 03.09.2010 19:22:
That's just the impression that I get from what you write and how you write it.
I just don't see any logical reasons, thats all. And I haven't heard any on this thread either.
Well, you are talking to people who know a lot more about what you are talking about than you do. It's normal that they are not equally enthusiastic about pie-in-the-sky ideas that someone throws at them.
BTW, Just because I top post for "readability" doesn't mean I haven't read all the threads in detail.
I like the fact that you put marks of irony around the word "readability". Stefan

Stefan, If I were to go with my impressions, based on you being the lead developer of Cython, I could have claimed you have an ulterior motive on this thread. But then I didn't because, inspite of first impressions/scepticism I believe we are all here with a genuine interest to improve the Python environment and get more visibiity and momentum on PyPy. Personally I cant wait to see PyPy become the default Python. :-) Lets start with an understanding that we are all smart people with good ideas and lets also not get too cocky enough to think we have all the answers. I saw some genuine synergies that I was calling out. And I have heard some pragmatic arguments from others though no one has necessarily claimed logical impossibility on why this may not work. And I can understand that. Sarvi ----- Original Message ---- From: Stefan Behnel <stefan_ml@behnel.de> To: pypy-dev@codespeak.net Sent: Fri, September 3, 2010 10:52:41 AM Subject: Re: [pypy-dev] Question on the future of RPython Saravanan Shanmugham, 03.09.2010 19:22:
That's just the impression that I get from what you write and how you write it.
Well, you are talking to people who know a lot more about what you are talking about than you do. It's normal that they are not equally enthusiastic about pie-in-the-sky ideas that someone throws at them.
BTW, Just because I top post for "readability" doesn't mean I haven't read all the threads in detail.
I like the fact that you put marks of irony around the word "readability". Stefan _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

On Fri, Sep 3, 2010 at 21:06, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
But then I didn't because, inspite of first impressions/scepticism I believe we are all here with a genuine interest to improve the Python environment
While you didn't initiate the flame, I think that's totally inappropriate, and I can say so even without knowing Stefan. You wrote: proper research if you just used the projects. You are welcome to be curious, but with such a comment you are the presumptuous one. Note that I already remarked that Stefan's comment was not appropriate in style. b) your email client _is_ crappy, given the way you reply inline (I was mentioning crappy clients in my previous email). Socially speaking, in an Open Source community, not using a decent email client can look as bad as dressing very very wrong. I'm not so picky, but it does mean you're not a hacker. Note I'm not a developer of PyPy, and I don't claim being an expert, but I have some technical knowledge of its documentation about internals and of some literature, and some small experience with a Python implementation. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

----- Original Message ---- From: Paolo Giarrusso <p.giarrusso@gmail.com> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: Stefan Behnel <stefan_ml@behnel.de>; pypy-dev@codespeak.net Sent: Fri, September 3, 2010 5:15:47 PM Subject: Re: [pypy-dev] Question on the future of RPython On Fri, Sep 3, 2010 at 21:06, Saravanan Shanmugham <sarvi@yahoo.com> wrote:
While you didn't initiate the flame, I think that's totally inappropriate, and I can say so even without knowing Stefan. Sarvi>> I believe it is an appropriate response to the flame bait. BTW, I was very careful not to make the accusation. No real offense meant. It was just a what if argument to drive the point that if every one responded like that, based on impression and presumptions, it would be wrong. So I standby that. You wrote: proper research if you just used the projects. You are welcome to be curious, but with such a comment you are the presumptuous one. Note that I already remarked that Stefan's comment was not appropriate in style. Sarvi>> We may have to agree to disagree here. I don't believe my thread of discussion has anything to do with Virtual Machines at al. What I have been saying has more to do with compiling plain RPython code into C/C++/ASM executables. Shedskin uses a statically typed restricted version of Python that gets converted to C++ PyPy does convert a statically typed restricted version of Python to C that can then be compiled to an executable. So though with different approachs the final goal is to produce an compiled binary executable for the RPython code. Agreed PyPy does additionally allow using Language/JIT hints to help write/generate JIT compilers as well. That does not remove the possibility that the statically typed version of Restricted Python used by Shedskin cannot be full subset of the PyPy RPython. Nor that there is a possibility of using PyPy as just a plain/pure Restricted Python compiler. pure and simple. This thought angle has nothing to do with Virtual Machines, really. b) your email client _is_ crappy, given the way you reply inline (I was mentioning crappy clients in my previous email). Socially speaking, in an Open Source community, not using a decent email client can look as bad as dressing very very wrong. I'm not so picky, but it does mean you're not a hacker. Sarvi>>> Point taken. I use plain Yahoo Web Mail. Do you have any suggestion how I could do better with the Yahoo Web Mail client??? I am open learning a better way. :-) Will look into it. Thanks, Sarvi Note I'm not a developer of PyPy, and I don't claim being an expert, but I have some technical knowledge of its documentation about internals and of some literature, and some small experience with a Python implementation. Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Hi, Can we please close this thread? The basic answer you will get from anybody that actually worked at least a bit with PyPy is that all your discussions are moving air around and nothing else. There is no one working with PyPy that is interested in using RPython for the purpose of compiling some RPython programs to C code statically (except interpreters). If anyone is really interested in this topic he can (again) give it a try. He would get some help from us, i.e. the rest of the PyPy team, but it would be a fork for now. I say "again" because there are some previous attempts at doing that, which all failed. As long as no such project exists and is successful -- and I have some doubts about it -- I will not believe in the nice (and, to me, completely bogus) claims made on this thread, like "let's bring RPython and Shedskin together". A bientot, Armin.

Hi Armin, Could you point me to some of these previous attempts at improving RPython-to-Executable capability. I would like to understand what was attempted. Hart's Antler, who seems to be working on RPython quite extensively contacted me privately about dong some work in the RPython area. I am considering sponsoring him in to do some work on PyPy,only if it is done with the PyPy teams blessings and will help help PyPy as a whole. Is there a wish list of RPython enhancements somewhere that the PyPy team might be considering? Stuff that would benefit RPython users in general. Sarvi ----- Original Message ----

Hi, On Mon, Sep 6, 2010 at 8:27 PM, Saravanan Shanmugham
I feel like I am repeating myself so that's my last mail to this thread. There are no enhancements we are considering to benefit other RPython users because *there* *are* *no* *other* *RPython* *users.* There is only us and RPython suits us just fine for the purpose for which it was designed. Again, feel free to make a fork or a branch of PyPy and try to develop a version of RPython that is more suited to writing general programs in. I don't know if there is a wish list of what is missing, but certainly I haven't given it much thoughts myself. Personally, I think that writing RPython programs is kind of fun, but in a perverse way -- if I could just write plain Python that was as fast or mostly as fast, it would be perfect. A bientôt, Armin.

i just had a (probably) silly idea. :) if some people like rpython so much, how about writing a rpython interpreter in rpython? wouldn't it be much easier for the jit to optimize rpython code? couldn't jitted rpython code theoretically be as fast as a program that got compiled to c from rpython? hm... but i wonder if this would make sense at all. maybe if you ran rpython code with pypy-c-jit, it already could be jitted as well as with a special rpython interpreter? ...if there were a special rpython interpreter, would the current jit generator have to be changed to take advantage of the more simple language? just curious... On Tue, Sep 7, 2010 at 11:07 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:

The current JIT generator creates a tracing jit, which gives very different performance profile to static compilation. For tight loops etc this might be ok, but might be different for the specific use case people are interested in (I admit I still don't know what that is). On 26/09/2010 1:47 AM, "horace grant" <horace3d@gmail.com> wrote: i just had a (probably) silly idea. :) if some people like rpython so much, how about writing a rpython interpreter in rpython? wouldn't it be much easier for the jit to optimize rpython code? couldn't jitted rpython code theoretically be as fast as a program that got compiled to c from rpython? hm... but i wonder if this would make sense at all. maybe if you ran rpython code with pypy-c-jit, it already could be jitted as well as with a special rpython interpreter? ...if there were a special rpython interpreter, would the current jit generator have to be changed to take advantage of the more simple language? just curious... On Tue, Sep 7, 2010 at 11:07 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Armin Rigo, 07.09.20...

On Sat, 2010-09-25 at 17:47 +0200, horace grant wrote:
An excellent question at least. A better idea, I think, would be to ask what subset of full-python will jit well. What I'd really like to see is a static analyzer that can display (e.g. by coloring names or lines) how "jit friendly" a piece of python code is. This would allow a programmer to get an idea of what help the jit is going to be when running their code and, hopefully, help people avoid tragic performance results. Naturally, for performance intensive code, you would still need to profile, but for a lot of uses, simply not having catastrophically bad performance is more than enough for a good user experience. With such a tool, it wouldn't really matter if the answer to "what is faster" is RPython -- it would be whatever python language subset happens to work well in a particular case. I've started working on something like this [1], but given that I'm doing a startup, I don't have nearly the time I would need to make this useful in the near-term. -Terrence [1] http://github.com/terrence2/melano

On Sun, 2010-09-26 at 23:57 -0700, Saravanan Shanmugham wrote:
What I wrote has apparently been widely misunderstood, so let me explain what I mean in more detail. What I want is _not_ RPython and it is _not_ Shedskin. What I want is not a compiler at all. What I want is a visual tool, for example, a plugin to an IDE. This tool would perform static analysis on a piece of python code. Instead of generating code with this information, it would mark up the python code in the text display with colors, weights, etc in order to show properties from the static analysis. This would be something like semantic highlighting, as opposed to syntax highlighting. I think it possible that this information would, if created and presented in the correct way, represent the sort of optimizations that pypy-c-jit -- a full python implementation, not a language subset -- would likely perform on the code if run. Given this sort of feedback, it would be much easier for a python coder to write code that works well with the jit: for example, moving a declaration inside a loop to avoid boxing, based on the information presented. Ideally, such a tool would perform instantaneous syntax highlighting while editing and do full parsing and analysis in the background to update the semantic highlighting as frequently as possible. Obviously, detailed static analysis will provide far more information than it would be possible to display on the code at once, so I see this gui as having several modes -- like predator vision -- that show different information from the analysis. Naturally, what those modes are will depend strongly on the details of how pypy-c-jit works internally, what sort of information can be sanely collected through static analysis, and, naturally, user testing. I was somewhat baffled at first as to how what I wrote before was interpreted as interest in a static python. I think the disconnect here is the assumption on many people's part that a static language will always be faster than a dynamic one. Given the existing tools that provide basically no feedback from the compiler / interpreter / jitter, this is inevitably true at the moment. I foresee a future, however, where better tools let us use the full power of a dynamic python AND let us tighten up our code for speed to get the full advantages of jit compilation as well. I believe that in the end, this combination will prove superior to any fully static compiler. -Terrence

On Mon, Sep 27, 2010 at 4:44 PM, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
This all looks interesting, and if you can plug that on emacs or textmate I would be really happy, but it is not what I want. I would settle for a tool that generates at runtime information about what the jit is doing in a simple text format (json, yaml or something even simpler?) and a tool to visualize this so you can optimize python programs to run on pypy easily. The biggest difference is that just collecting this info from the JIT appears to be much much easier than somehow implement a static processor for python code that do some form of analysis. I think that fijal is at least thinking about doing such a tool right? -- Leonardo Santagada

On Mon, Sep 27, 2010 at 21:58, Leonardo Santagada <santagada@gmail.com> wrote:
Have you looked at what the Azul Java VM supports for Java, in particular RTPM (Real Time Performance Monitoring)? Academic accounts are available, and from Cliff Click's presentations, it seems to be a production-quality solution for this (for Java), which could give interesting ideas. Azul business is exclusively centered around Java optimization at the JVM level, so while not-so-famous they are quite relevant. See slide 28 of: www.azulsystems.com/events/vee_2009/2009_VEE.pdf for some more details. See also wiki.jvmlangsummit.com/pdf/36_Click_fastbcs.pdf, and the account about JRuby's slowness (caused by unreliable performance analysis tools). Given that JIT can beat static compilation only through forms of profile-directed optimization, I also believe that the interesting information should be obtained through logs from the JIT. A static analyser can't do something better than a static compiler - not reliably at least. _However_, static semantic highlighting might still be interesting: while it does not help understanding profile-directed optimizations done by the JIT, it might help understanding the consequences of the execution model of the language itself, where it has a weird impact on performance. E.g., for CPython, it might be very useful simply highlighting usages of global variables, that require a dict lookup, as "bad", especially in tight loops. OTOH, that kind of optimization should be done by a JIT like PyPy, not by the programmer. I believe that CALL_LIKELY_BUILTIN and hidden classes already allow PyPy to fix the problem without changing the source code. The question then is: which kinds of constructs are unexpectedly slow in Python, even with a good JIT? Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

On Tue, 2010-09-28 at 00:52 +0200, Paolo Giarrusso wrote:
Briefly, but it's not open source, and it's a Java thing, so it didn't pique my interest significantly.
I'd be pursuing the jit logging approach much more aggressively if I cared at all about Python2 anymore. All of the source I care about analyzing is in Python3. However, considering the rate I'm going, pypy will doubtless support python3 by the time I get a half-way descent static analyzer working anyway, so it's probably worth considering.
Precisely. I'd love a good answer to that question. In addition to jitting, although it would not technically be python anymore, I see a place for something like SPUR or Jaegermonkey -- combined compilation and jitting. Naturally, the performance of such a beast over a jit alone would be dependent on how much boxing the compiler could remove. My goal for this work is about half geared towards answering that single question, just so I'll know if I should stop dreaming about python eventually having performance parity with C/C ++. I tend to think that having a solid (if never perfect) static analyzer for python could help in many areas. I had thought that helping coders help the jit out would be a good first use, but as you say, there will be problems with that. Regardless, my hope is that a library for static analysis of python will be more generally useful than my own hare-brained schemes. In any case, I'm working on this in the form of a code editor first because, regardless of what the answer to the previous question is, I know from experience that highlighting for python like what SourceInsight does for C++ will be extremely useful. Thank you for the kind feedback, your comments are much appreciated. -Terrence
Best regards

Monday 27 September 2010 you wrote:
The JIT works because it has more information at runtime than what is available at compile time. If the information was available at compile time we could do the optimizations then and not have to invoke the extra complexity required by the JIT. Examples of the extra information include things like knowing that introspection will not be used in the current evaluation of a loop, specific argument types will be used in calls and that some arguments will be known to be constant over part of the program execution.. Knowing these bits allows you to optimize away large chunks o f the code that otherwise would have been executed. Static analysis assumes that none of the above mentioned possibilities can actually take place. It is impossible to make such assumptions at compile time in a dynamic language. Therefore PyPy is a bad match for people wanting to staically compile subsets of Python. Applying the JIT to RPython code is not workable, because the JIT is optimized to remove bits of generated assembler code that never shows up in the compilation of RPython code. These are very basic first principle concepts, and it is a mystery to me why people can't work them out for themselves. Jacob Hallén

On Tue, 2010-09-28 at 01:57 +0200, Jacob Hallén wrote:
Yes, that idea is just dumb. It's also not what I suggested at all. I can see now that what I said would be easy to misinterpret, but on re-reading it, it clearly doesn't say what you think it does.
You are quite right that static analysis will be able to do little to help an optimal jit. However, I doubt that in the near term pypy's jit will cover all the dark corners of python equally well -- C has been around for 38 years and its still got room for optimization. -Terrence
Jacob Hallén

On 28 September 2010 10:43, Terrence Cole <list-sink@trainedmonkeystudios.org> wrote:
It does make /some/ sense, I think. From the perspective of the JIT, operating at interp-level, the app-level python program *is the biggest part of* the "stuff you don't know about until runtime". That is, you don't know the program source at translation time, and most of the information the JIT is supposed to find are app-level constructs (eg app-level loops). Of course any such analysis will fall flat in certain cases, like eval(raw_input(...)). But you should still be able to gather enough information for most fairly hygenic code. What sort of analyses did you have in mind?
There are some undesirable things about static analysis, but it can sure be useful from optimisation, security and reliability perspectives. There's also code browsing, too; IDEs require a different (fuzzier) parser, but the question of 'what types does this object probably have' makes more sense with a little dependent region analysis. Optimising when you can be fairly confident of the types involved could be useful. That doesn't really sound like pypy at that point, though. -- William Leslie

On Tue, 2010-09-28 at 11:55 +1000, William Leslie wrote:
I think this is a disconnect. Applying a jit to a non-interpretted language -- Jacob here seems to think I was talking about a static, compiled subset of python -- makes little sense. Static analysis to provide help to an interpreter does, as you say, make some sense, and not to just me. Brett Cannon applied static type analysis to the CPython interpreter for his PHD thesis [1], looking for a speed boost by removing some typing abstraction. Unfortunately, it was not spectacularly helpful for CPython. I think for pypy-jit, however, it has much greater potential because of the possibility of full unboxing. Given past results however, it's not the first place I'd go looking for speedups. Others may have better ideas in this area than I do though.
This is one of the reasons that I had to pull together my own parsing (largely borrowed from pypy, actually) and analysis infrastructure, rather than just using pypy's off-the-shelf. Even without pypy's neat analysis code, the fact that it ditches character-level info when making an ast means you can't apply highlighting with it without groping about half-blindly in the source.
Given the choice between the status quo and an extremely slow eval, but much faster python overall, I think most people would pick the second.
What sort of analyses did you have in mind?
As this is a side project, for the moment I am focusing on simple stuff, mostly things I need/want for work. In the short term these include Python3 linting (which is almost working) and static type analysis. The second will be particularly interesting because we have (at work) annotated most of our interfaces with type data, so this will probably net much more specific and helpful data than it would in many projects. I am also, specifically, as I mentioned to Paolo yesterday, trying to find out how much of our code could be fully unboxed, given that we have extensive type contracts at our interfaces. If the answer is "most of it", then it may make sense for us to build something like Jaegermonkey for python someday.
Brendan Eich agrees [2]. This is heartening, because javascript has much in common with python. I agree too, for that matter, but that's probably a lot less heartening :-).
There's also code browsing, too; IDEs require a different (fuzzier) parser,
Reason number two that I have to maintain a separate parser/analyzer.
Given that I want to work with Python3 anyway (and that I'd never be able to beat pypy's performance before it supports Python3), I'm focusing mostly on a tool to help make reliable and correct code. However, performance is always in the back of my mind these days. It seems from this thread that I won't be able to do much in that regard with my current approach, unfortunately. Maybe by the time I can focus on it, pypy will support python3 and I can work on providing real-time jit feedback. -Terrence [1] http://www.ocf.berkeley.edu/~bac/thesis.pdf [2] http://brendaneich.com/2010/08/static-analysis-ftw/

On Tue, 2010-09-28 at 15:20 +0200, Maciej Fijalkowski wrote:
Lots. They're almost all probably wrong though, so be warned :-). I'm also not entirely clear on what you mean, so let me tell you what I have in mind and you can tell me if I'm way off base. I assume workflow would go like this: 1) run pypy on a bunch of code in profiling mode, 2) pypy spits out lots of data about what happened in the jit when the program exits, 3) start up external analysis program pointing it at this data, 4) browse the python source with the data from the jit overlayed as color, formatting, etc on top of the source. Potentially there would be several separate modes for viewing different aspects of the jit info. This could also include the ability to select different program elements (loops, variables, functions, etc) and get detailed information about their runtime usage in a side-pane. Ideally, this workflow would be taken care of automatically by pushing the run button in your IDE. As a more specific example of what the gui would do in, for instance, escape analysis mode: display local variables that do not escape any loops in green, others in red. Hovering over a red variable would show information about how, why, and where it escapes the loop in a tooltip or bubble. Selecting a red variable show the same info in a pane and would draw arrows on the source showing where it escapes from a loop/function etc. In my ideal world, this profiling data analysis would sit side-by-side with various display modes that show useful static analysis feedback, all inside a full-fledged python IDE. This is all, of course, a long way off still. What I'm working on right now is basic linting for python3 so that I can add a lint step to our hudson server and start to get some graphs up. What I _really_ would like to work on, if I had the time, is making pypy support Python3 so that I could use it at work. However, I think I'd mostly just get in the way if I tried that, given my other time commitments. I hope there was something helpful in that brain-dump, but I suspect I may be way off target at this point. -Terrence

Hi Terrence, hi all On 28/09/10 22:33, Terrence Cole wrote:
You can already to it (partially) by using the PYPYLOG environment variable like this: PYPYLOG=jit-log-opt:mylog ./pypy -m test.pystone then, mylog contains all the loops and bridges produced by the jit. The interesting point is that there are also special operations called "debug_merge_point" that are emitted for each python bytecode, so you can easily map the low-level jit instructions back to the original python source. E.g., take lines 214 of pystone: Array1Par[IntLoc+30] = IntLoc The corresponding python bytecode is this: 214 38 LOAD_FAST 4 (IntLoc) 41 LOAD_FAST 0 (Array1Par) 44 LOAD_FAST 4 (IntLoc) 47 LOAD_CONST 3 (30) 50 BINARY_ADD 51 STORE_SUBSCR By searching in the logs, you find the following (I edited it a bit to improve readability): debug_merge_point('<code object Proc8, line 208> #38 LOAD_FAST') debug_merge_point('<code object Proc8, line 208> #41 LOAD_FAST') debug_merge_point('<code object Proc8, line 208> #44 LOAD_FAST') debug_merge_point('<code object Proc8, line 208> #47 LOAD_CONST') debug_merge_point('<code object Proc8, line 208> #50 BINARY_ADD') debug_merge_point('<code object Proc8, line 208> #51 STORE_SUBSCR') p345 = new_with_vtable(ConstClass(W_IntObject)) setfield_gc(p345, 8, descr=<SignedFieldDescr pypy.objspace.std.intobject.W_IntObject.inst_intval 8>) call(ConstClass(ll_setitem__dum_checkidxConst_listPtr_Signed_objectPtr), p333, 38, p345, descr=<VoidCallDescr>) guard_no_exception(, descr=<Guard146>) [p1, p0, p71, p345, p312, p3, p4, p6, p308, p315, p335, p12, p13, p14, p15, p16, p18, p19, p178, p26, p320, p328, i124, p25, i329] Here, you can see that most opcodes are "empty" (i.e., no operations between one debug_merge_point and the next). In general, all the opcodes that manipulate the python stack are optimized away by the jit, because all the python variables on the stack become "local variables" in the assembler. Moreover, you can see that BINARY_ADD is also empty: this probably means that the loop was specialized for the specific value of IntLoc, so the addition has been constant-folded away. Indeed, the only opcode that do real work is STORE_SUBSCR. What it does it to allocate a new W_IntObject whose value is 8 (i.e., boxing IntLoc on the fly, because it's escaping), and store it into the element 38 of the list stored in p333. Finally, we check that no exception was raised. Obviously, when presenting these information to the user you must consider that there is not a 1-to-1 mapping from python source to jit loops. In the example above, the very same opcodes are compiled also in another loop (which by chance it has the same jit-operations, but they might also be very different, depending on the cases). As you can see, there is already lot of information that can be useful to the user. However, don't ask me how to present it visually :-) ciao, anto

As you can see, there is already lot of information that can be useful to the user. However, don't ask me how to present it visually :-)
As you've probably noticed, it takes quite a bit of skill to actually read it and say which variables are unescaped locals for example

On Wed, 2010-09-29 at 11:37 +0200, Antonio Cuni wrote:
I think that 'easily' in that last sentence is missing scare-quotes. :-)
Wow, thank you for the awesome explanation. I think the only surprising thing in there is that I actually understood all of that.
Currently, in my hacked together parsing chain, the low-level parser keeps a reference to the underlying token when it creates a new node and subsequently the ast builder keeps a references to the low-level parse node when it creates an ast node. This way, I can easily map down to the individual source chars and full context when walking the AST to do highlighting, linting, etc. My first inclination would be to continue this chain and add a bytecode compiler on top of the ast builder. This would keep ast node references in the instructions it creates. If the algorithms don't diverge too much, I think this would allow the debug output to be mapped all the way back to the source chars with minimal effort. I'm not terrifically familiar with the specifics of how python emits bytecode from an ast, so I'd appreciate any feedback if you think this is crazy-talk.
As you can see, there is already lot of information that can be useful to the user. However, don't ask me how to present it visually :-)
Neither do I, but finding out is going to be the fun part.
ciao, anto
I'm excited to try some of this out, but unfortunately, there is an annoying problem in the way. All of my work in the last year has been on python3. Having worked in python3 for awhile now, my opinion is that it's just a much better language -- surprisingly so, considering how little it changed from python2. If pypy supported python3, then I could maintain my parser as a diff against pypy (you are switching to Mercurial at some point, right?), which would make it much easier to avoid divergence. So what I'm getting at is: what is the pypy story for python3 support? I haven't seen anything on pypy-dev, or in my occasional looks at the repository, to suggest that it is being worked on but I'm sure you have a plan of some sort. I'm willing to help out with python3 support, if I can do so without getting in anyone's way. It seems like the sort of thing that will be disruptive, however, so I have been leery of jumping in, considering how little time I have to contribute, at the moment. In my mind, the python3 picture is something like: At the compilation level, it's easy enough to dump Grammar3.2 in pypy/interpreter/pyparser/data and to modify astbuilder for python3 -- I'll backport the changes I made, if you want. Interpreter support for the new language features will be harder, but that's probably already done since 2.7 is almost working. The only potential problems I see are the big string/unicode switch and the management of the fairly large changes to astbuilder -- I'm sure you want to continue supporting python2 into the future. I don't know how much the bytecode changed between 2 and 3, so I'm not sure if there are jit issues to worry about. Am I missing anything big? -Terrence

Hi Terence, all, On Wed, Sep 29, 2010 at 13:40 -0700, Terrence Cole wrote:
In fact, there has been work from Benjamin Peterson and is some work from Amaury and Alex to complete the http://codespeak.net/svn/pypy/branch/fast-forward/ branch. It aims at offering Python2.7 compatibility. This is a good intermediate step to jump to Python3 at some point. Most PyPy core devs are focusing on JIT related tasks so this is a good place to help out in general. If you like to help you can drop by at #pypy on freenode and/or maybe some of the involved persons can point to some tasks here. cheers, holger
--

Hi Maciej, On Thu, Sep 30, 2010 at 10:51 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Other big issues are about RPython itself. Do we want RPython to be python3 compatible? How?
No, I'm pretty sure that even if we want to support python3 at some point, RPython will remain what it is now, and translate.py will remain a python2 tool. Armin

On 29/09/10 22:40, Terrence Cole wrote:
well, it's easy as long as you have a bytecode-compiled version around. With only the AST I agree that it might be a bit trickier. [cut]
Are you using your custom-made AST or the one from the standard library? In the latter case, you can just pass the ast to the compile() builtin function to get the corresponding bytecode.

You should seriously read and try to understand the e-mails that you reply to, instead of top-posting them away.
Stefan, there are different ways to argue the same valid thing, and the way you chose is IMHO counterproductive for you - the only result is offensive comments. Also, while I seldom top-post, especially in public forums/MLs, IIRC several PyPy contributors routinely top-post, and I see some sensible arguments (see http://en.wikipedia.org/wiki/Posting_style#Top-posting). Saravanan, a small part of the issue is that many people consider top posting inappropriate and/or lame (for instance http://www.caliburn.nl/topposting.html). Be aware of that risk if you top-post. And please, never claim it increases readability - it makes your post only readable if you read the whole thread. Non-crappy email clients highlight differently the new text from the quoted text, making the former easy to find. In particular, by top-posting you never address the comments which explain why merging does not necessarily make sense (like some of mine), or the ones which argue it's a bad idea (like last Amaury's mail). Interleaved replying brings instead to point-by-point answers. See other comments below. On Fri, Sep 3, 2010 at 11:30, Stefan Behnel <stefan_ml@behnel.de> wrote:
Which then begs the question, would it make sense for PyPy to adopt Shedskin to compile its PyPy RPython code into C++/binary.
The answer is already implicit in one of my previous emails, and is a very clear "no, unless considerable extra merging effort is done, which might be more than the effort to make the RPython compiler better than Shedskin". I paste a relevant subset of that mail at the end; while I can believe that you have read it, I often do not understand all the implications of what I read the first time, if that's complex, like it is for everybody, so do not be offended if I suggest you to re-read it again. A similar, more detailed argument is discussed by Amaury Forgeot d'Arc in an email where he replies to you. In other mails, you write that:
I just don't see any logical reasons [against the merger], thats all. And I haven't heard any on this thread either.
no one has necessarily claimed logical impossibility on why this may not work.
which strikes me as _wrong_. The mails I mentioned explain why there are different design goals - Amaury, who knows more about Shedskin than me, explained why it is less general. That's already an answer for me. Of course, this does not prove impossibility, it only suggests that it may be not a good idea to merge the projects. You shouldn't care about logical impossibility, which makes _NO SENSE_ in such questions in software engineering; what is possible and bad makes little sense. If you meant "nobody claimed that this is necessarily a bad idea", then I agree. We believe there's no obvious way to combine the projects; anybody, including you, is welcome to address the specific issues and find some clever solution. You didn't even scratch them, yet. And while you claimed experience with using the projects, or reading their documentation, it is not clear at all that you understand their internals, and this is required to address these problems. The only idea which makes some sense is that instead of starting the development of Shedskin, the author could have tried achieving the same results improving RPython, fixing its error messages and so on. However, I can imagine a ton of possible reasons for which he might have consciously decided to do something else. The keyword here is "design tradeoff": a design choice can make a product better in some respects and worse in other ones. Shedskin is less flexible, but possibly this gives technical advantages which are important. That's the same thing as explained below. Best regards ===== => Do different goals cause _incompatible_ design/implementation choices? Currently, static typing versus global type inference seems to be already a fundamental difference. Modular type inference, if polymorphic (and I guess it has to be), would require using boxed or tagged integers more often, as far as I can see. RPython is intended to be compiled to various environments, with different variations (choose GC/stackful or stackless/heaps of other choices), and its programmers are somewhat OK with its limitations; it has type inference, with its set of tradeoffs. This for instance prevents reusing shedskin, and probably even prevents reusing any of its code. -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

----- Original Message ---- From: Paolo Giarrusso <p.giarrusso@gmail.com> To: Stefan Behnel <stefan_ml@behnel.de>; Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Fri, September 3, 2010 4:34:24 PM Subject: Re: [pypy-dev] Question on the future of RPython
You should seriously read and try to understand the e-mails that you reply to, instead of top-posting them away.
Stefan, there are different ways to argue the same valid thing, and the way you chose is IMHO counterproductive for you - the only result is offensive comments. Also, while I seldom top-post, especially in public forums/MLs, IIRC several PyPy contributors routinely top-post, and I see some sensible arguments (see http://en.wikipedia.org/wiki/Posting_style#Top-posting). Saravanan, a small part of the issue is that many people consider top posting inappropriate and/or lame (for instance http://www.caliburn.nl/topposting.html). Be aware of that risk if you top-post. And please, never claim it increases readability - it makes your post only readable if you read the whole thread. Non-crappy email clients highlight differently the new text from the quoted text, making the former easy to find. Sarvi>> Point taken. will keep that in mind. It was misguided notion of what would be readable. Sarvi In particular, by top-posting you never address the comments which explain why merging does not necessarily make sense (like some of mine), or the ones which argue it's a bad idea (like last Amaury's mail). Interleaved replying brings instead to point-by-point answers. See other comments below. On Fri, Sep 3, 2010 at 11:30, Stefan Behnel <stefan_ml@behnel.de> wrote:
The answer is already implicit in one of my previous emails, and is a very clear "no, unless considerable extra merging effort is done, which might be more than the effort to make the RPython compiler better than Shedskin". I paste a relevant subset of that mail at the end; while I can believe that you have read it, I often do not understand all the implications of what I read the first time, if that's complex, like it is for everybody, so do not be offended if I suggest you to re-read it again. A similar, more detailed argument is discussed by Amaury Forgeot d'Arc in an email where he replies to you. In other mails, you write that:
no one has necessarily claimed logical impossibility on why this may not work.
which strikes me as _wrong_. The mails I mentioned explain why there are different design goals - Amaury, who knows more about Shedskin than me, explained why it is less general. That's already an answer for me. Of course, this does not prove impossibility, it only suggests that it may be not a good idea to merge the projects. You shouldn't care about logical impossibility, which makes _NO SENSE_ in such questions in software engineering; what is possible and bad makes little sense. If you meant "nobody claimed that this is necessarily a bad idea", then I agree. We believe there's no obvious way to combine the projects; anybody, including you, is welcome to address the specific issues and find some clever solution. You didn't even scratch them, yet. And while you claimed experience with using the projects, or reading their documentation, it is not clear at all that you understand their internals, and this is required to address these problems. The only idea which makes some sense is that instead of starting the development of Shedskin, the author could have tried achieving the same results improving RPython, fixing its error messages and so on. However, I can imagine a ton of possible reasons for which he might have consciously decided to do something else. The keyword here is "design tradeoff": a design choice can make a product better in some respects and worse in other ones. Shedskin is less flexible, but possibly this gives technical advantages which are important. That's the same thing as explained below. Best regards ===== => Do different goals cause _incompatible_ design/implementation choices? Currently, static typing versus global type inference seems to be already a fundamental difference. Modular type inference, if polymorphic (and I guess it has to be), would require using boxed or tagged integers more often, as far as I can see. RPython is intended to be compiled to various environments, with different variations (choose GC/stackful or stackless/heaps of other choices), and its programmers are somewhat OK with its limitations; it has type inference, with its set of tradeoffs. This for instance prevents reusing shedskin, and probably even prevents reusing any of its code. -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Hi, 2010/9/3 Saravanan Shanmugham <sarvi@yahoo.com>
But PyPy does not translate RPython code to C++. Or before doing so, it performs transformations to the code that require the analysis of the program as a whole and that a C++ compiler cannot do, like the choice of a garbage collector, the stackless mode, and most of all the generation of a tracing JIT. It also operates on the bytecode, which offers interesting metaprogramming techniques that are used throughout the code (similar to C++ templates, for example, except that it's written in Python :-) ) Shedskin on the other hand performs a more direct translation of Python code (it uses the ast) Both projects don't have the same goals. -- Amaury Forgeot d'Arc
participants (14)
-
Amaury Forgeot d'Arc
-
Antonio Cuni
-
Armin Rigo
-
Douglas McNeil
-
holger krekel
-
horace grant
-
Jacob Hallén
-
Leonardo Santagada
-
Maciej Fijalkowski
-
Paolo Giarrusso
-
Saravanan Shanmugham
-
Stefan Behnel
-
Terrence Cole
-
William Leslie