
Hi, All I am an undergraduate student (at a top-5 CS university in the UK) looking for a fun final-year project to sink my teeth into. Over the last couple of months, I became quite interested in programming languages, interpreters, virtual machines, and so on. I found out about PyPy at Holger Krekel's talks at 21C3 in December and now it seems like an ideal way to do all of these things! Can anyone suggest pointers to areas of PyPy that need work? Or, better yet, ideas that need further development -- research in literature, prototyping, and implementation? Thanks for your time. Please CC me -- I am not on the list. -- Boris

Hi there On Fri, 23 Sep 2005, Boria Feigin wrote:
I will try to answer this - maybe the others can fill in? First of all - thank you for your interest in PyPy! We have lots of work ;-) - the main question is how familiar you are with Python specifically and wether you have read the documentation that exists on Codespeak? As to the research aspect - we are just now entering our second phase of the funded (EU) side of PyPy and the upcoming 9 months is all about optimizations and exploration so there should be interesting stuff to do that could fit well for a final year project. In October, 10-16th to be more exact we will meet up to sprint in Paris and we will devote some time to plan the work in phase 2 in some more detail. Maybe we could get back to you after that? Meanwhile (if you havent done so already) you could read some of the documentation and identify areas of interest to you?
Cheers Beatrice Düring Assistant project manager/PyPy The non techie person ;-)

Hi Boria, We are indeed starting to think about more focused research areas in PyPy. For example, along these lines, we will need more work on compiler optimizations and generation of code for various architectures, both either statically or just-in-time. More along the lines of interpreters and virtual machines, we could start investigating new aspects that would be useful to code into the interpreter or the translation process: continuations, persistence (either dumping a whole-process image or something more fine-grained), security (running code in a sandbox), and much more, all of which is hinted at in some documentation on Codespeak. Finally, there is also the idea of supporting other dynamic languages than Python by writing an interpreter for them in PyPy. As Bea pointed out, we should collect these ideas and make them more precise soon: On Fri, Sep 23, 2005 at 04:28:46PM +0100, Boria Feigin wrote:
In addition, note that this sprint is not meant to focus exclusively on discussion; it's a coding sprint, and we specifically welcome newcomers. If possible and interesting for you, feel invited :-) That's the best way to grasp the basics of PyPy and discuss. Also feel free to say hello in the #pypy IRC channel (irc.freenode.net) and discuss your interests. A bientot, Armin.

``A Survey of Adaptive Optimization in Virtual Machines'' by M. Arnold, S. J. Fink, D. Grove, M. Hind, and P. F. Sweeney Proceedings of the IEEE, Vol. 93, No. 2, February 2005 (http://scholar.google.com finds the paper itself and an ealier preprint) A nice survey by the IBM Jikes people. Plenty of further references: looks like it might be a good place to start. Though, I don't know enough about PyPy yet to see exactly how (or if) this would fit in. -- Boris

Hello, pypy people, Armin Rigo a écrit :
I have been looking into pypy for a few days, and trying to understand how to make the Lisp backend work. I now understand that due to pypy heavily layered nature, it is possible to use a language like lisp, as a target, at different levels. For instance, bytecode interpreter (by providing a proper set of functions to handle the bytecodes ?), or compiler (by translating the function graph into low-level lisp). Right now, I am sticking with the second way, as it was started in gencl.py (btw, one question that is not clear to me is about the function graph : does it contain python or rpython opcodes ?) I am also thinking about another possibility (wrt your last sentence) : translating directly the parse tree into high-level lisp. Maybe this idea is nonsense. In fact, the distance from de-sugared python to the opcode is unknown to me (even though I suspect there is no 1:1 mapping between the two). Whatever, that could be the start of a strategy to translate python towards other high-level languages (ruby, or js as in the other thread) without paying the full price of opcode interpretation in the target (that is : parts which are semantically similar could be easily translated, others would be -costily- emulated). Thoughts ? (aka : does it look like gibberish ?) Thanks, Aurélien.

Aurélien Campéas <aurelien.campeas@free.fr> writes:
Cool!
Um, neither. It roughly goes through three potential stages: unannotated SpaceOperations, annotated SpaceOperations and finally LowLevelOperations. I think a lisp translator would probably want to work with annotated SpaceOperations; LowLevelOperations would be too low level (probably :).
It's not necessarily nonsense, but it's not really The PyPy Way.
Thing is, I don't know how feasible this is. It's pretty hard, without some kind of type inference, to translate, say this Python: a + b into anything significantly more efficient than this Common Lisp: (py:add a b) And making type inference possible is what RPython is all about. You could make #'py:add a generic function and see if a given CLOS implementation is fast enough to give a useful speed (but I think the coercion rules would probably drive you insane first). Cheers, mwh -- ARTHUR: Yes. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying "Beware of the Leopard". -- The Hitch-Hikers Guide to the Galaxy, Episode 1

Michael Hudson a écrit :
Ok, thanks :) But then, I still don't get completely the relation between python, rpython, and the three potential stages of the function graph (probably I have to reread again all the doc on codespeak)
ok
The mere fact that it will be compiled will make it more efficient I guess. I mean, not on CLISP, but with a real lisp compiler.
And making type inference possible is what RPython is all about.
Sure, but then, it is a restricted subset of Python, and I like python completely unrestricted ;)
In this case, CLOS would add overhead. In fact the python add operator (and some arith. ops) seems close enough to the lisp one that one can be tempted to translate in the lisp equivalent with only minor adaptation (like the printer appending an "L" on bignums, etc.). I am not sure I would use CLOS at all, in fact (at least for a first attempt at producing a lisp backend). BTW, what's this "insanity with coercion rules" that you mention - can you expand a little on this ?
Cheers, mwh
Many thanks, Aurélien.

Am 27.09.2005 um 14:17 schrieb Aurélien Campéas:
BTW, what's this "insanity with coercion rules" that you mention - can you expand a little on this ?
See http://docs.python.org/ref/coercion-rules.html Excerpt: =================== - Below, __op__() and __rop__() are used to signify the generic method names corresponding to an operator; __iop__ is used for the corresponding in-place operator. For example, for the operator `+', __add__() and __radd__() are used for the left and right variant of the binary operator, and __iadd__ for the in-place variant. - For objects x and y, first x.__op__(y) is tried. If this is not implemented or returns NotImplemented, y.__rop__(x) is tried. If this is also not implemented or returns NotImplemented, a TypeError exception is raised. But see the following exception: - Exception to the previous item: if the left operand is an instance of a built-in type or a new-style class, and the right operand is an instance of a proper subclass of that type or class, the right operand's __rop__() method is tried before the left operand's __op__ () method. This is done so that a subclass can completely override binary operators. Otherwise, the left operand's __op__ method would always accept the right operand: when an instance of a given class is expected, an instance of a subclass of that class is always acceptable. - When either operand type defines a coercion, this coercion is called before that type's __op__() or __rop__() method is called, but no sooner. If the coercion returns an object of a different type for the operand whose coercion is invoked, part of the process is redone using the new object. =================== However, this is just an attempt at documenting the current behavior, AFAIK there is no definite reference except for what CPython implements. (I'm working on running [a subset of] Python in Smalltalk) - Bert -

Aurélien Campéas <aurelien.campeas@free.fr> writes:
RPython is a subset of python with the main constraint being able to do some level of static type analysis (for full python, the amount of static ananlysis you can do is really very small, you can read Brett Cannon's thesis about this: http://www.ocf.berkeley.edu/~bac/thesis.pdf). So, when translating, the code that implements the interpreter (roughly interpreter/*.py, objspace/std/*.py) is imported and a flow graph built from it. This is then annotated (code in annotator/*.py), a fairly hairy process. This (for the C and LLVM backends, at least) is then turned into a graph containing low level operations (like "dereference this pointer"). Python is just the language all of the above happens to be implemented in (and also the interpreter/*.py code is involved in making a flow graph which includes itself, but this isn't that important -- just confusing :).
Not much. The "interpreter overhead" for Python is usually estimated at about 20%, though it obviously depends what code you're running to some extent.
Well, so do we all, but then you can't have type inference. There is no simple answer to this.
Er... no. #'cl:+ is not that much like python's + (e.g. the latter operates on strings and you can't overload the former).
I am not sure I would use CLOS at all, in fact (at least for a first attempt at producing a lisp backend).
Fair enough.
BTW, what's this "insanity with coercion rules" that you mention - can you expand a little on this ?
Think about things like 2L**135 < 1.0e40 or range(10)*3 or... mixed type operations are not that simple in Python. Cheers, mwh -- Considering that this thread is completely on-topic in the way only c.l.py threads can be, I think I can say that you should replace "Oblivion" with "Gravity", and increase your Radiohead quotient. -- Ben Wolfson, comp.lang.python

Michael Hudson a écrit :
So the same graph can exist in three states : raw python, annotated python (possible whenever the raw python matches rpython), low-level stuff ? Then, to translate unrestricted python, I have to work on the first pass/state of the graph.
hmmmm, 20% related to what ? Is this an observed quantity from benchmarking, say, C-translated rpython vs CPython ? Also, "on average" can, as you say, cover a lot of differences depending on what we have to do.
Has the possibility to extend python with optional type annotations been studied ? (I guess the pypy-dev archives could answer this)
Thanks for reminding me that.
... but the more I think about it, it looks like CLOS might help with these overloaded operators ...
... and coercion rules ... Well, for now I will go on tweaking a CL backend based on pyrex's one. I guess I will discover the real problems as I progress. Thanks also to Bert and Samuele, Aurélien.

Aurélien Campéas <aurelien.campeas@free.fr> writes:
Yes. Though I don't know if you can even form the flow graph of unrestricted Python (not saying you can't, just it wasn't a design goal).
Then, to translate unrestricted python, I have to work on the first pass/state of the graph.
Yes. But believe us on this one: ahead-of-time analysis on unrestricted python is not going to lead to significant speedups.
Well, it's very much a guesstimate, but what I was meaning to compare was interpreted python as against compiled c code that executes the same operations in the same order but without the overhead of dispatching the opcodes (this is more or less what the now more-or-less dead Python2C project did. There's a reason it's more-or-less dead).
Also, "on average" can, as you say, cover a lot of differences depending on what we have to do.
Indeed! If you're computation is bound, say, by the speed of multiplying really large integers, the interpreter overhead is roughly zero. If you're manipluating small ints, then it's probably quite large.
Well, there's been a fair amount of hot air emitted about it, but no actual work that I know about.
Maybe :) Please don't let me discourage you too much; I'd be genuinely interested in what you find. But please don't expect miracles. Cheers, mwh -- <MFen> want to write my requirements for me? <radix> Sure! <radix> "show a dancing monkey in the about box" -- from Twisted.Quotes

Michael Hudson a écrit :
Still confused I am, then. I thought you could annotate the graph because you programmed into rpython and thus enabled type inference, thereby allowing the production of a low-level C-friendly version of the graph ... ok, it's not that important right now for me to understand it all anyway
But there are maybe other benefits of an unrestricted python translator to lisp than only getting it natively compiled for small perf gains. Ability to develop quickly arbitrary extensions to python could be one beneficial side-effect, for instance.
Builtin ops like bignums arithmetic or whatever is implemented in C is obviously fast. OTOH, I wonder if some implementation choices of current CPyton, and part of its slowness, were made balancing simplicity of the code versus speed (stackless could be an example of a faster implementation, couldn't it ?). I remember having read stuff about that in some distant past.
Thanks, I am not discouraged and don't expect miracles :) What pypy brings me right now is a nice way to enhance my {python,lisp}-fu at the same time. I am for sure quite far from discovering anything not already known. Would this happen, I'd let pypy people informed. Cheers, Aurélien.

Aurélien Campéas wrote: ...
At that time this was true, Stackless had been 5% slower than normal Python. Somewhere at 1.5.2 :-) But at the same time, I had implemented an aceleration of the interpreter loop of 10-15 %, which worked especially well with the windows compiler. Python was not interested in my path, only recently they are selling their grandma for a little speed. So I took the chance to speed up my slightly slower Stackless, to get a little advantage for those who didn't realize the real benefits of Stackless. Summary: No, it isn't faster, maybe even slightly slower. But it can be much faster if you use its features to implement your algorithms in a way Python cannot do it. Which still gets more relative because they pushed generators to the extremes (still limited but good). ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Aurélien Campéas wrote: ...
Sure it is, and both server- and client-side. -- Christian Tismer :^) <mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Hi Aurelien, On Wed, Sep 28, 2005 at 01:17:56PM +0200, Aur?lien Camp?as wrote:
Well, for now I will go on tweaking a CL backend based on pyrex's one. I guess I will discover the real problems as I progress.
Any reason not to start from Seo's GenCL backend (pypy/translator/gencl) ? A bientot, Armin

Armin Rigo a écrit :
That's what I tried first. Then I found it was much easier to work from pyrex translator (maybe because the later is slightly more complete than current gencl.py ?). I have yet to learn how to correctly walk the function graph and it is easier to use pyrex as a starting template than gencl. Moreover, gencl is lisp-wise wrong (setq-ing previously undefined variables, while accepted with warnings by some compilers, is not correct ; starting from genpyrex, it was easy to generate a big top-level LET form for each function, for instance). Also note that I am working very slowly on this since am currently busy looking for a job :) (hunting down positions, tailoring motivation letters and CVs is not an easy task...) ... but nevertheless hacking on gencl is a nice way to recover from anxiety :)
A bientot,
Armin
A+ Aurélien

Hi Aurelien, On Thu, Sep 29, 2005 at 02:54:01PM +0200, Aur?lien Camp?as wrote:
Er, they are really both old pieces of code... Note that the approach they use has a fundamental problem -- not trying to stop you, as it's indeed quite some fun :-) The problem is that after the initial basic types like integers and strings, there are lots of methods to implement e.g. on lists and dicts, and then there are rather more obscure annotations like SomePBC() that are quite involved. Just to frighten you a bit, you'd also need at some point to do something like pypy/rpython/normalizecalls.py :-) What we have in mind is to support targets like CL by a modified RTyper (pypy/rpython/r*.py); it means that there would be an additional processing step between the graph-with-annotations and the CL backend which would simplify the graphs and replace complex operations and annotations with more primitive ones. This will make the task of the backend far easier. A bientot, Armin.

Armin Rigo a écrit :
Yes, I have had this feeling.
Note that the approach they use has a fundamental problem -- not trying to stop you, as it's indeed quite some fun :-)
This is an important point :)
I would tend to provide a huge prelude implemeting a close aprox. of those in CL (it is my plan).
I can't really be frightened by something I don't know about nor understand ...
You have lost me :) A+ Aurélien.

Hi Aurelien, On Sat, Oct 01, 2005 at 02:44:50PM +0200, Aur?lien Camp?as wrote:
I would tend to provide a huge prelude implemeting a close aprox. of those in CL (it is my plan).
Yes, this would be possible. I suppose even dark corners like dictionary with user-specified ways of computing the hash of keys could be done this way.
That's the problem, I suppose. RPython is larger and more delicate than it seems to be at first. After a lot of trashed efforts in GenC, we eventually found out the correct approach, which is:
Compare the flow graphs after annotation and after rtyping (e.g. with translate_pypy.py targetrichards.py --annotate versus --rtype). The latter are in a very controlled format, using strict C-like types (structures and arrays and nothing more). Even the constants are prebuilt structures and arrays. It is the rtyper that does the hard job of converting from the annotated graphs to these low-level graphs. For backends that are a bit higher-level than C, we need some intermediate solution that uses some parts of the rtyper and some parts of the direct code generation you have in mind. That's also something we should discuss at the Paris sprint with Bert, as it is what a Smalltalk back-end should need as well. But for now I suppose that writing the direct code generation for the easy parts is the best way to start. A bientot, Armin.

Armin Rigo a écrit :
Yes, thru sxhash probably Well, rarely used obscure corner cases are not my priority, I need to get going on to more basic stuff :)
It seems also very entangled into various parts of pypy. And even after having read the doc again, i still don't really get its justification (for instance http://codespeak.net/pypy/dist/pypy/doc/coding-guide.html#our-runtime-interp... is not too clear for me).
But never mind, I don't want to work on a low-level graph ! Right now i'm investigating interpreter/transformer, stablecompiler and friends to see if it is possible to get an early, unannotated, unrestricted, and close-to-python parse-tree that could undergo an almost trivial tranformation towards sexps (and then, a prefix python notation).
Yes, i'm sorry, the rpython stuff is currently beyond me and will remain so for a while. The current value of pypy for me is the existence of a nice python source code -> AST transformer usable from python. As a final by-product of this work, i'd like to get an embedded-in-lisp python compiler. Someone earlier said it was not a goal of pypy to proceed like that. We'll see. Anyway, thanks for the snippets of light, Aurélien.

Hi Aurelien, On Sat, Oct 01, 2005 at 09:58:00PM +0200, Aur?lien Camp?as wrote:
That's rather a by-product. The basic source code -> AST transformer is already in the standard Python library, where it's available with a simple 'import compiler'. We just cleaned up a bit (or rather a lot, as it was quite buggy).
I'm sorry I scared you off flow graphs. I can only refer you again to Michael's e-mail earlier in this thread. People commonly underestimate the problems he mentioned; this has led to a number of projects of Python-in-Insert-Language-Here, all aborted for basically the same reason: the basic built-in types and semantics are easy enough, but supporting the complete Python semantics is rather more involved and impossible to add incrementally on top of a simple and fast approach. PyPy is born from realizing this; we (painfully) wrote the complete semantics once and won't start duplicating that logic. Instead, we rewrite support for the infinitely simpler RPython semantics, and even then we try to share code between back-ends (which is the purpose of the RTyper). Using the RTyper is not mandatory for back-ends, but you definitely cannot analyse RPython code from the AST for the good reason that it's a language without source code. Anyway, instead of hand-waving, I guess we really need to implement a non-C back-end soon enough, just to show that we *can*. It's the whole point of the rather complicated translation architecture we've built. A bientot, Armin.

Armin Rigo a écrit :
Well I'm glad Pypy brings that :)
You didn't ; only pointed compactly some of the complexity one perceives reading the code. I need an understandable entry point anyway, so beginning with the "easy" stuff and rediscovering already known mistakes is somewhat unavoidable ; it'll help to understand the rationale behind Pypy's architecture.
I guess supporting completely every quirk of one "real world" programming language from another is never a trivial task.
Can you expand on this ? I tend to see the AST as a programmatically manageable representation of the source code. Is this expectation wrong ?
I am glad it will be possible to see it at work. The knwoledge embodied in Pypy for sure is not easy to convey without a nice demo (and I haven't yet dared look at the C backend). Cheers, Aurélien.

Hi Aurelien, On Sun, Oct 02, 2005 at 03:02:48AM +0200, Aur?lien Camp?as wrote:
No, you are right, but RPython has neither a source nor an AST representation: * first, we take a normal Python program and we import it in CPython. During the import, lots of initialization stuff happens. We may even call more initialization functions that will set up more stuff. This phase creates module, class, function and even user instances -- e.g. in the case where the normal Python program is the PyPy interpreter, we build a complete object space during this phase, which is highly non-trivial and creates a lot of instances of our own classes, referencing each other in complex ways. * if all this is carefully done, then the resulting set of objects (classes, functions, prebuilt instances) may conform to the RPython rules. This is what an "RPython program" really is: it's a point in time, a snapshot, a set of objects referencing each other in such a way that the functions left in this set of objects will no longer perform dynamic stuff after this point in time. An RPython program has no immediate syntactic equivalent. You can consider that each function individually has got an AST, but there is no *global* or even module-level AST to translate. In PyPy, instead of the AST, we turn each function into a flow graph and look at that. We perform the "annotation" (type inference) on the family of all flow graphs.
Indeed. I guess I have gone too quickly to too obscure considerations. In one sentence, I should have said that it makes sense in PyPy to start from annotated flow graphs like gencl and genpyrex do, instead of considering ASTs, because it's the approach that can eventually evolve into a complete RPython solution with the help of other parts of PyPy. At this point it definitely makes sense to rebuild a CL backend from scratch to get the feeling of flow graphs. A bientot, Armin.

Hi, On Sat, Oct 01, 2005 at 10:23:31PM +0200, Armin Rigo wrote:
BTW, I didn't forget LLVM here -- LLVM is very similar to C. I meant that we are missing a back-end for a higher-level language that comes with its own complex VM, with its own concepts closer to RPython -- e.g. the RPython strings, instances and classes typically have more direct equivalents. A bientot, Armin

On Friday 23 Sep 2005 13:02, Boria Feigin wrote: ...
I am an undergraduate student (at a top-5 CS university in the UK) looking for a fun final-year project to sink my teeth into.
As I understand pypy it can theoretically take a number of alternate front ends as well as back ends. I don't know how much work has gone into that, but a whimsical idea that sprang to mind just now as an alternate project idea (that I'm not convinced should ever be done >:-) is this: Last summer, for fun, I wrote a generic language parser that can handle almost anything you throw at it. The basic underlying idea was "what if in lisp you removed all the brackets and used a python-like syntax instead". (Or depending on the time of day "What if we removed all the keywords from python, could we still parse something that looked close to python>") The idea was that you would use :<newline><indent> stuff </dedent> where you would use nested brackets. I did leave in a small amount of infix support though, and did think about things like: I only worked on a parsing/AST building, but it might be an "interesting" alternate front end. Unlike python I found that in order to support control flows like if...elif...elif..else, and try...except...finally, and similar that you needed an end tag (could be anything though) in order to simplify the parser. (Hints to the lexer would have been an alternate approach, but would've required 2 passes in the way I was thinking) In case that doesn't make any sense, if result is None: print "Parse failed" else: print "Success" end is a function call to an "if" function that among it's arguments takes 2 anonymous code blocks with an "else" tag inbetween and followed by an "end" tag. Similarly: pen down repeat 4: forward 10 rotate 90 end is two function calls - one to pen, and one to repeat. Repeat takes 3 arguments here - 4, the code block and the "end" tag. Anyway, the code is here: cerenity.org/SWP/ - that also includes a pdf of the lightning talk I gave at ACCU (though not the slightly nicer version I gave at europython I notice... Hmm... ) Slide 8 shows how nuts this can get with mixing of an L-System description language with an SML-esque syntax for a structural stack, with a logo type language, and started off with a python-esque bootstrap. At europython I also had an SQL-like statement in there for added silliness. (The point was that the example parsed cleanly, not whether it meant anything) The entire parser was written test first, with the tests being sample programs - which you can find here: * http://cerenity.org/SWP/progs/ The parser itself is quite short, the grammer is LL(1) compliant and consists of 13 productions. Anyway, as an alternate front end this could be a mad (but fun) little project. I'm not convinced it would actually be a GOOD idea to do though :) Regards, Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Hi, post.
[snipped]
not whether it meant anything) There's the rub. For example, one could write,
if foo > 10: threshold = 20 end select: from titles where (id < threshold) end The question then becomes, is there a way to leverage PyPy (type inference) to _statically_ verify that `id' and `threshold' are of the same type?
P.S. This may be off-topic, but you may also like to have a look at this http://www.martinfowler.com/articles/languageWorkbench.html piece by Martin Fowler. -- Boris

Hi there On Fri, 23 Sep 2005, Boria Feigin wrote:
I will try to answer this - maybe the others can fill in? First of all - thank you for your interest in PyPy! We have lots of work ;-) - the main question is how familiar you are with Python specifically and wether you have read the documentation that exists on Codespeak? As to the research aspect - we are just now entering our second phase of the funded (EU) side of PyPy and the upcoming 9 months is all about optimizations and exploration so there should be interesting stuff to do that could fit well for a final year project. In October, 10-16th to be more exact we will meet up to sprint in Paris and we will devote some time to plan the work in phase 2 in some more detail. Maybe we could get back to you after that? Meanwhile (if you havent done so already) you could read some of the documentation and identify areas of interest to you?
Cheers Beatrice Düring Assistant project manager/PyPy The non techie person ;-)

Hi Boria, We are indeed starting to think about more focused research areas in PyPy. For example, along these lines, we will need more work on compiler optimizations and generation of code for various architectures, both either statically or just-in-time. More along the lines of interpreters and virtual machines, we could start investigating new aspects that would be useful to code into the interpreter or the translation process: continuations, persistence (either dumping a whole-process image or something more fine-grained), security (running code in a sandbox), and much more, all of which is hinted at in some documentation on Codespeak. Finally, there is also the idea of supporting other dynamic languages than Python by writing an interpreter for them in PyPy. As Bea pointed out, we should collect these ideas and make them more precise soon: On Fri, Sep 23, 2005 at 04:28:46PM +0100, Boria Feigin wrote:
In addition, note that this sprint is not meant to focus exclusively on discussion; it's a coding sprint, and we specifically welcome newcomers. If possible and interesting for you, feel invited :-) That's the best way to grasp the basics of PyPy and discuss. Also feel free to say hello in the #pypy IRC channel (irc.freenode.net) and discuss your interests. A bientot, Armin.

``A Survey of Adaptive Optimization in Virtual Machines'' by M. Arnold, S. J. Fink, D. Grove, M. Hind, and P. F. Sweeney Proceedings of the IEEE, Vol. 93, No. 2, February 2005 (http://scholar.google.com finds the paper itself and an ealier preprint) A nice survey by the IBM Jikes people. Plenty of further references: looks like it might be a good place to start. Though, I don't know enough about PyPy yet to see exactly how (or if) this would fit in. -- Boris

Hello, pypy people, Armin Rigo a écrit :
I have been looking into pypy for a few days, and trying to understand how to make the Lisp backend work. I now understand that due to pypy heavily layered nature, it is possible to use a language like lisp, as a target, at different levels. For instance, bytecode interpreter (by providing a proper set of functions to handle the bytecodes ?), or compiler (by translating the function graph into low-level lisp). Right now, I am sticking with the second way, as it was started in gencl.py (btw, one question that is not clear to me is about the function graph : does it contain python or rpython opcodes ?) I am also thinking about another possibility (wrt your last sentence) : translating directly the parse tree into high-level lisp. Maybe this idea is nonsense. In fact, the distance from de-sugared python to the opcode is unknown to me (even though I suspect there is no 1:1 mapping between the two). Whatever, that could be the start of a strategy to translate python towards other high-level languages (ruby, or js as in the other thread) without paying the full price of opcode interpretation in the target (that is : parts which are semantically similar could be easily translated, others would be -costily- emulated). Thoughts ? (aka : does it look like gibberish ?) Thanks, Aurélien.

Aurélien Campéas <aurelien.campeas@free.fr> writes:
Cool!
Um, neither. It roughly goes through three potential stages: unannotated SpaceOperations, annotated SpaceOperations and finally LowLevelOperations. I think a lisp translator would probably want to work with annotated SpaceOperations; LowLevelOperations would be too low level (probably :).
It's not necessarily nonsense, but it's not really The PyPy Way.
Thing is, I don't know how feasible this is. It's pretty hard, without some kind of type inference, to translate, say this Python: a + b into anything significantly more efficient than this Common Lisp: (py:add a b) And making type inference possible is what RPython is all about. You could make #'py:add a generic function and see if a given CLOS implementation is fast enough to give a useful speed (but I think the coercion rules would probably drive you insane first). Cheers, mwh -- ARTHUR: Yes. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying "Beware of the Leopard". -- The Hitch-Hikers Guide to the Galaxy, Episode 1

Michael Hudson a écrit :
Ok, thanks :) But then, I still don't get completely the relation between python, rpython, and the three potential stages of the function graph (probably I have to reread again all the doc on codespeak)
ok
The mere fact that it will be compiled will make it more efficient I guess. I mean, not on CLISP, but with a real lisp compiler.
And making type inference possible is what RPython is all about.
Sure, but then, it is a restricted subset of Python, and I like python completely unrestricted ;)
In this case, CLOS would add overhead. In fact the python add operator (and some arith. ops) seems close enough to the lisp one that one can be tempted to translate in the lisp equivalent with only minor adaptation (like the printer appending an "L" on bignums, etc.). I am not sure I would use CLOS at all, in fact (at least for a first attempt at producing a lisp backend). BTW, what's this "insanity with coercion rules" that you mention - can you expand a little on this ?
Cheers, mwh
Many thanks, Aurélien.

Am 27.09.2005 um 14:17 schrieb Aurélien Campéas:
BTW, what's this "insanity with coercion rules" that you mention - can you expand a little on this ?
See http://docs.python.org/ref/coercion-rules.html Excerpt: =================== - Below, __op__() and __rop__() are used to signify the generic method names corresponding to an operator; __iop__ is used for the corresponding in-place operator. For example, for the operator `+', __add__() and __radd__() are used for the left and right variant of the binary operator, and __iadd__ for the in-place variant. - For objects x and y, first x.__op__(y) is tried. If this is not implemented or returns NotImplemented, y.__rop__(x) is tried. If this is also not implemented or returns NotImplemented, a TypeError exception is raised. But see the following exception: - Exception to the previous item: if the left operand is an instance of a built-in type or a new-style class, and the right operand is an instance of a proper subclass of that type or class, the right operand's __rop__() method is tried before the left operand's __op__ () method. This is done so that a subclass can completely override binary operators. Otherwise, the left operand's __op__ method would always accept the right operand: when an instance of a given class is expected, an instance of a subclass of that class is always acceptable. - When either operand type defines a coercion, this coercion is called before that type's __op__() or __rop__() method is called, but no sooner. If the coercion returns an object of a different type for the operand whose coercion is invoked, part of the process is redone using the new object. =================== However, this is just an attempt at documenting the current behavior, AFAIK there is no definite reference except for what CPython implements. (I'm working on running [a subset of] Python in Smalltalk) - Bert -

Aurélien Campéas <aurelien.campeas@free.fr> writes:
RPython is a subset of python with the main constraint being able to do some level of static type analysis (for full python, the amount of static ananlysis you can do is really very small, you can read Brett Cannon's thesis about this: http://www.ocf.berkeley.edu/~bac/thesis.pdf). So, when translating, the code that implements the interpreter (roughly interpreter/*.py, objspace/std/*.py) is imported and a flow graph built from it. This is then annotated (code in annotator/*.py), a fairly hairy process. This (for the C and LLVM backends, at least) is then turned into a graph containing low level operations (like "dereference this pointer"). Python is just the language all of the above happens to be implemented in (and also the interpreter/*.py code is involved in making a flow graph which includes itself, but this isn't that important -- just confusing :).
Not much. The "interpreter overhead" for Python is usually estimated at about 20%, though it obviously depends what code you're running to some extent.
Well, so do we all, but then you can't have type inference. There is no simple answer to this.
Er... no. #'cl:+ is not that much like python's + (e.g. the latter operates on strings and you can't overload the former).
I am not sure I would use CLOS at all, in fact (at least for a first attempt at producing a lisp backend).
Fair enough.
BTW, what's this "insanity with coercion rules" that you mention - can you expand a little on this ?
Think about things like 2L**135 < 1.0e40 or range(10)*3 or... mixed type operations are not that simple in Python. Cheers, mwh -- Considering that this thread is completely on-topic in the way only c.l.py threads can be, I think I can say that you should replace "Oblivion" with "Gravity", and increase your Radiohead quotient. -- Ben Wolfson, comp.lang.python

Michael Hudson a écrit :
So the same graph can exist in three states : raw python, annotated python (possible whenever the raw python matches rpython), low-level stuff ? Then, to translate unrestricted python, I have to work on the first pass/state of the graph.
hmmmm, 20% related to what ? Is this an observed quantity from benchmarking, say, C-translated rpython vs CPython ? Also, "on average" can, as you say, cover a lot of differences depending on what we have to do.
Has the possibility to extend python with optional type annotations been studied ? (I guess the pypy-dev archives could answer this)
Thanks for reminding me that.
... but the more I think about it, it looks like CLOS might help with these overloaded operators ...
... and coercion rules ... Well, for now I will go on tweaking a CL backend based on pyrex's one. I guess I will discover the real problems as I progress. Thanks also to Bert and Samuele, Aurélien.

Aurélien Campéas <aurelien.campeas@free.fr> writes:
Yes. Though I don't know if you can even form the flow graph of unrestricted Python (not saying you can't, just it wasn't a design goal).
Then, to translate unrestricted python, I have to work on the first pass/state of the graph.
Yes. But believe us on this one: ahead-of-time analysis on unrestricted python is not going to lead to significant speedups.
Well, it's very much a guesstimate, but what I was meaning to compare was interpreted python as against compiled c code that executes the same operations in the same order but without the overhead of dispatching the opcodes (this is more or less what the now more-or-less dead Python2C project did. There's a reason it's more-or-less dead).
Also, "on average" can, as you say, cover a lot of differences depending on what we have to do.
Indeed! If you're computation is bound, say, by the speed of multiplying really large integers, the interpreter overhead is roughly zero. If you're manipluating small ints, then it's probably quite large.
Well, there's been a fair amount of hot air emitted about it, but no actual work that I know about.
Maybe :) Please don't let me discourage you too much; I'd be genuinely interested in what you find. But please don't expect miracles. Cheers, mwh -- <MFen> want to write my requirements for me? <radix> Sure! <radix> "show a dancing monkey in the about box" -- from Twisted.Quotes

Michael Hudson a écrit :
Still confused I am, then. I thought you could annotate the graph because you programmed into rpython and thus enabled type inference, thereby allowing the production of a low-level C-friendly version of the graph ... ok, it's not that important right now for me to understand it all anyway
But there are maybe other benefits of an unrestricted python translator to lisp than only getting it natively compiled for small perf gains. Ability to develop quickly arbitrary extensions to python could be one beneficial side-effect, for instance.
Builtin ops like bignums arithmetic or whatever is implemented in C is obviously fast. OTOH, I wonder if some implementation choices of current CPyton, and part of its slowness, were made balancing simplicity of the code versus speed (stackless could be an example of a faster implementation, couldn't it ?). I remember having read stuff about that in some distant past.
Thanks, I am not discouraged and don't expect miracles :) What pypy brings me right now is a nice way to enhance my {python,lisp}-fu at the same time. I am for sure quite far from discovering anything not already known. Would this happen, I'd let pypy people informed. Cheers, Aurélien.

Aurélien Campéas wrote: ...
At that time this was true, Stackless had been 5% slower than normal Python. Somewhere at 1.5.2 :-) But at the same time, I had implemented an aceleration of the interpreter loop of 10-15 %, which worked especially well with the windows compiler. Python was not interested in my path, only recently they are selling their grandma for a little speed. So I took the chance to speed up my slightly slower Stackless, to get a little advantage for those who didn't realize the real benefits of Stackless. Summary: No, it isn't faster, maybe even slightly slower. But it can be much faster if you use its features to implement your algorithms in a way Python cannot do it. Which still gets more relative because they pushed generators to the extremes (still limited but good). ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Aurélien Campéas wrote: ...
Sure it is, and both server- and client-side. -- Christian Tismer :^) <mailto:tismer@stackless.com> tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Hi Aurelien, On Wed, Sep 28, 2005 at 01:17:56PM +0200, Aur?lien Camp?as wrote:
Well, for now I will go on tweaking a CL backend based on pyrex's one. I guess I will discover the real problems as I progress.
Any reason not to start from Seo's GenCL backend (pypy/translator/gencl) ? A bientot, Armin

Armin Rigo a écrit :
That's what I tried first. Then I found it was much easier to work from pyrex translator (maybe because the later is slightly more complete than current gencl.py ?). I have yet to learn how to correctly walk the function graph and it is easier to use pyrex as a starting template than gencl. Moreover, gencl is lisp-wise wrong (setq-ing previously undefined variables, while accepted with warnings by some compilers, is not correct ; starting from genpyrex, it was easy to generate a big top-level LET form for each function, for instance). Also note that I am working very slowly on this since am currently busy looking for a job :) (hunting down positions, tailoring motivation letters and CVs is not an easy task...) ... but nevertheless hacking on gencl is a nice way to recover from anxiety :)
A bientot,
Armin
A+ Aurélien

Hi Aurelien, On Thu, Sep 29, 2005 at 02:54:01PM +0200, Aur?lien Camp?as wrote:
Er, they are really both old pieces of code... Note that the approach they use has a fundamental problem -- not trying to stop you, as it's indeed quite some fun :-) The problem is that after the initial basic types like integers and strings, there are lots of methods to implement e.g. on lists and dicts, and then there are rather more obscure annotations like SomePBC() that are quite involved. Just to frighten you a bit, you'd also need at some point to do something like pypy/rpython/normalizecalls.py :-) What we have in mind is to support targets like CL by a modified RTyper (pypy/rpython/r*.py); it means that there would be an additional processing step between the graph-with-annotations and the CL backend which would simplify the graphs and replace complex operations and annotations with more primitive ones. This will make the task of the backend far easier. A bientot, Armin.

Armin Rigo a écrit :
Yes, I have had this feeling.
Note that the approach they use has a fundamental problem -- not trying to stop you, as it's indeed quite some fun :-)
This is an important point :)
I would tend to provide a huge prelude implemeting a close aprox. of those in CL (it is my plan).
I can't really be frightened by something I don't know about nor understand ...
You have lost me :) A+ Aurélien.

Hi Aurelien, On Sat, Oct 01, 2005 at 02:44:50PM +0200, Aur?lien Camp?as wrote:
I would tend to provide a huge prelude implemeting a close aprox. of those in CL (it is my plan).
Yes, this would be possible. I suppose even dark corners like dictionary with user-specified ways of computing the hash of keys could be done this way.
That's the problem, I suppose. RPython is larger and more delicate than it seems to be at first. After a lot of trashed efforts in GenC, we eventually found out the correct approach, which is:
Compare the flow graphs after annotation and after rtyping (e.g. with translate_pypy.py targetrichards.py --annotate versus --rtype). The latter are in a very controlled format, using strict C-like types (structures and arrays and nothing more). Even the constants are prebuilt structures and arrays. It is the rtyper that does the hard job of converting from the annotated graphs to these low-level graphs. For backends that are a bit higher-level than C, we need some intermediate solution that uses some parts of the rtyper and some parts of the direct code generation you have in mind. That's also something we should discuss at the Paris sprint with Bert, as it is what a Smalltalk back-end should need as well. But for now I suppose that writing the direct code generation for the easy parts is the best way to start. A bientot, Armin.

Armin Rigo a écrit :
Yes, thru sxhash probably Well, rarely used obscure corner cases are not my priority, I need to get going on to more basic stuff :)
It seems also very entangled into various parts of pypy. And even after having read the doc again, i still don't really get its justification (for instance http://codespeak.net/pypy/dist/pypy/doc/coding-guide.html#our-runtime-interp... is not too clear for me).
But never mind, I don't want to work on a low-level graph ! Right now i'm investigating interpreter/transformer, stablecompiler and friends to see if it is possible to get an early, unannotated, unrestricted, and close-to-python parse-tree that could undergo an almost trivial tranformation towards sexps (and then, a prefix python notation).
Yes, i'm sorry, the rpython stuff is currently beyond me and will remain so for a while. The current value of pypy for me is the existence of a nice python source code -> AST transformer usable from python. As a final by-product of this work, i'd like to get an embedded-in-lisp python compiler. Someone earlier said it was not a goal of pypy to proceed like that. We'll see. Anyway, thanks for the snippets of light, Aurélien.

Hi Aurelien, On Sat, Oct 01, 2005 at 09:58:00PM +0200, Aur?lien Camp?as wrote:
That's rather a by-product. The basic source code -> AST transformer is already in the standard Python library, where it's available with a simple 'import compiler'. We just cleaned up a bit (or rather a lot, as it was quite buggy).
I'm sorry I scared you off flow graphs. I can only refer you again to Michael's e-mail earlier in this thread. People commonly underestimate the problems he mentioned; this has led to a number of projects of Python-in-Insert-Language-Here, all aborted for basically the same reason: the basic built-in types and semantics are easy enough, but supporting the complete Python semantics is rather more involved and impossible to add incrementally on top of a simple and fast approach. PyPy is born from realizing this; we (painfully) wrote the complete semantics once and won't start duplicating that logic. Instead, we rewrite support for the infinitely simpler RPython semantics, and even then we try to share code between back-ends (which is the purpose of the RTyper). Using the RTyper is not mandatory for back-ends, but you definitely cannot analyse RPython code from the AST for the good reason that it's a language without source code. Anyway, instead of hand-waving, I guess we really need to implement a non-C back-end soon enough, just to show that we *can*. It's the whole point of the rather complicated translation architecture we've built. A bientot, Armin.

Armin Rigo a écrit :
Well I'm glad Pypy brings that :)
You didn't ; only pointed compactly some of the complexity one perceives reading the code. I need an understandable entry point anyway, so beginning with the "easy" stuff and rediscovering already known mistakes is somewhat unavoidable ; it'll help to understand the rationale behind Pypy's architecture.
I guess supporting completely every quirk of one "real world" programming language from another is never a trivial task.
Can you expand on this ? I tend to see the AST as a programmatically manageable representation of the source code. Is this expectation wrong ?
I am glad it will be possible to see it at work. The knwoledge embodied in Pypy for sure is not easy to convey without a nice demo (and I haven't yet dared look at the C backend). Cheers, Aurélien.

Hi Aurelien, On Sun, Oct 02, 2005 at 03:02:48AM +0200, Aur?lien Camp?as wrote:
No, you are right, but RPython has neither a source nor an AST representation: * first, we take a normal Python program and we import it in CPython. During the import, lots of initialization stuff happens. We may even call more initialization functions that will set up more stuff. This phase creates module, class, function and even user instances -- e.g. in the case where the normal Python program is the PyPy interpreter, we build a complete object space during this phase, which is highly non-trivial and creates a lot of instances of our own classes, referencing each other in complex ways. * if all this is carefully done, then the resulting set of objects (classes, functions, prebuilt instances) may conform to the RPython rules. This is what an "RPython program" really is: it's a point in time, a snapshot, a set of objects referencing each other in such a way that the functions left in this set of objects will no longer perform dynamic stuff after this point in time. An RPython program has no immediate syntactic equivalent. You can consider that each function individually has got an AST, but there is no *global* or even module-level AST to translate. In PyPy, instead of the AST, we turn each function into a flow graph and look at that. We perform the "annotation" (type inference) on the family of all flow graphs.
Indeed. I guess I have gone too quickly to too obscure considerations. In one sentence, I should have said that it makes sense in PyPy to start from annotated flow graphs like gencl and genpyrex do, instead of considering ASTs, because it's the approach that can eventually evolve into a complete RPython solution with the help of other parts of PyPy. At this point it definitely makes sense to rebuild a CL backend from scratch to get the feeling of flow graphs. A bientot, Armin.

Hi, On Sat, Oct 01, 2005 at 10:23:31PM +0200, Armin Rigo wrote:
BTW, I didn't forget LLVM here -- LLVM is very similar to C. I meant that we are missing a back-end for a higher-level language that comes with its own complex VM, with its own concepts closer to RPython -- e.g. the RPython strings, instances and classes typically have more direct equivalents. A bientot, Armin

On Friday 23 Sep 2005 13:02, Boria Feigin wrote: ...
I am an undergraduate student (at a top-5 CS university in the UK) looking for a fun final-year project to sink my teeth into.
As I understand pypy it can theoretically take a number of alternate front ends as well as back ends. I don't know how much work has gone into that, but a whimsical idea that sprang to mind just now as an alternate project idea (that I'm not convinced should ever be done >:-) is this: Last summer, for fun, I wrote a generic language parser that can handle almost anything you throw at it. The basic underlying idea was "what if in lisp you removed all the brackets and used a python-like syntax instead". (Or depending on the time of day "What if we removed all the keywords from python, could we still parse something that looked close to python>") The idea was that you would use :<newline><indent> stuff </dedent> where you would use nested brackets. I did leave in a small amount of infix support though, and did think about things like: I only worked on a parsing/AST building, but it might be an "interesting" alternate front end. Unlike python I found that in order to support control flows like if...elif...elif..else, and try...except...finally, and similar that you needed an end tag (could be anything though) in order to simplify the parser. (Hints to the lexer would have been an alternate approach, but would've required 2 passes in the way I was thinking) In case that doesn't make any sense, if result is None: print "Parse failed" else: print "Success" end is a function call to an "if" function that among it's arguments takes 2 anonymous code blocks with an "else" tag inbetween and followed by an "end" tag. Similarly: pen down repeat 4: forward 10 rotate 90 end is two function calls - one to pen, and one to repeat. Repeat takes 3 arguments here - 4, the code block and the "end" tag. Anyway, the code is here: cerenity.org/SWP/ - that also includes a pdf of the lightning talk I gave at ACCU (though not the slightly nicer version I gave at europython I notice... Hmm... ) Slide 8 shows how nuts this can get with mixing of an L-System description language with an SML-esque syntax for a structural stack, with a logo type language, and started off with a python-esque bootstrap. At europython I also had an SQL-like statement in there for added silliness. (The point was that the example parsed cleanly, not whether it meant anything) The entire parser was written test first, with the tests being sample programs - which you can find here: * http://cerenity.org/SWP/progs/ The parser itself is quite short, the grammer is LL(1) compliant and consists of 13 productions. Anyway, as an alternate front end this could be a mad (but fun) little project. I'm not convinced it would actually be a GOOD idea to do though :) Regards, Michael. -- Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks@rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP This e-mail may contain personal views which are not the views of the BBC.

Hi, post.
[snipped]
not whether it meant anything) There's the rub. For example, one could write,
if foo > 10: threshold = 20 end select: from titles where (id < threshold) end The question then becomes, is there a way to leverage PyPy (type inference) to _statically_ verify that `id' and `threshold' are of the same type?
P.S. This may be off-topic, but you may also like to have a look at this http://www.martinfowler.com/articles/languageWorkbench.html piece by Martin Fowler. -- Boris
participants (9)
-
Armin Rigo
-
Aurélien Campéas
-
Beatrice During
-
Bert Freudenberg
-
Boria Feigin
-
Christian Tismer
-
Michael Hudson
-
Michael Sparks
-
Samuele Pedroni