sprints goals / restricted python?

Hello pypy and hello Armin :-) ASFAIK your idea of restricted python is to implement a TranslationObjSpace which emits C-code. We would e.g. run the interpreter (by running e.g. tests) against StdObjSpace and use the TranslationObjSpace to generate a C-file which is - when compiled and run - our restricted interpreter which can execute bytecode. I don't know but i think going the TranslationObjSpace way might turn into complicated code. For example how do you "detect" a loop from an objectspace's POV? It just sees a series of a series of ObjSpace-Method accesses and it seems hard to reverse engineer the loop. (I guess you say this loop-unfolding is basically a space/time trade-off :-) please correct me, if/where i am wrong. Anyway, I think the above approach is pretty ambitious for a three to four day sprint but this isn't the only "goal" criterium, of course. If i am not mistaken, we don't have any framework to generate C-code and make it "self-runnable" at the moment. That's why i started thinking about Pyrex which provides a framework (written in python) for generating C-code. The input is mostly python-compatible with some additional "type" information. It seems to me that if we (focus on) - make the interpreter record/annotate names with types (all names: function names, local names, global names, ...) - use this "annotation" information to generate Pyrex modules - use the pyrexc compiler to generate C-Modules from the generated Pyrex modules, we might get a working (partial-)C-interpreter module by the end of the sprint. To actually perform the annotation we could use an AnnotationObjSpace which proxies to the StdObjSpace but additionally annotates "names" with the types of the objects beeing referenced. Bascially we might just need to "intercept" all setattr__ANY_String_ANY calls so that for a given AnnotationObjSpace.setattr(x,y,z) call we roughly perform some hands-on annotation by storing it in "x.__pypytypemap__" which is a dictionary containing name-dictionary pairs. the latter dictionary then records all the types (or even values?) that have ever been associated with the name. Of course, the above generation series would still require CPython (because the generated C file compiles against C-Python). But later on, we could - hack up pyrex to generate C-files which don't require a full CPython anymore - hack up pyrex to be used with a TranslationObjSpace as a second means to generate the restricted interpreter. One of the positive points about this approach is that we use existing tools/frameworks (written in python) and we can probably split the group on various sub-tasks easily. But i am very open to your corrections ... cheers, holger

At 14:39 17.06.2003 +0200, holger krekel wrote:
Hello pypy and hello Armin :-)
ASFAIK your idea of restricted python is to implement a TranslationObjSpace which emits C-code. We would e.g. run the interpreter (by running e.g. tests) against StdObjSpace and use the TranslationObjSpace to generate a C-file which is - when compiled and run - our restricted interpreter which can execute bytecode.
I don't know but i think going the TranslationObjSpace way might turn into complicated code. For example how do you "detect" a loop from an objectspace's POV? It just sees a series of a series of ObjSpace-Method accesses and it seems hard to reverse engineer the loop. (I guess you say this loop-unfolding is basically a space/time trade-off :-)
please correct me, if/where i am wrong.
you can use some kind of ObjSpace to do abstract interpretation, but you can't simply use all the normal interpreter mechanism. You either start from the AST and there loops and control flow are clear, or from the bytecode and use it as it is as flow graph, each bytecode is a node and there are edges from each bytecode to its successors in case of execution, or reduce it to basic blocks etc... regards.

Hello Samuele, [Samuele Pedroni Tue, Jun 17, 2003 at 05:31:48PM +0200]
At 14:39 17.06.2003 +0200, holger krekel wrote:
Hello pypy and hello Armin :-)
ASFAIK your idea of restricted python is to implement a TranslationObjSpace which emits C-code. We would e.g. run the interpreter (by running e.g. tests) against StdObjSpace and use the TranslationObjSpace to generate a C-file which is - when compiled and run - our restricted interpreter which can execute bytecode.
I don't know but i think going the TranslationObjSpace way might turn into complicated code. For example how do you "detect" a loop from an objectspace's POV? It just sees a series of a series of ObjSpace-Method accesses and it seems hard to reverse engineer the loop. (I guess you say this loop-unfolding is basically a space/time trade-off :-)
please correct me, if/where i am wrong.
you can use some kind of ObjSpace to do abstract interpretation, but you can't simply use all the normal interpreter mechanism.
You either start from the AST and there loops and control flow are clear, or from the bytecode and use it as it is as flow graph, each bytecode is a node and there are edges from each bytecode to its successors in case of execution, or reduce it to basic blocks etc...
but this means you can't simply implement a TranslationObjSpace and use it to generate the C-code for a restricted Python interpreter, right? cheers, holger

At 18:05 17.06.2003 +0200, holger krekel wrote:
but this means you can't simply implement a TranslationObjSpace and use it to generate the C-code for a restricted Python interpreter, right?
you should also change the execution engine to work somehow like psyco, or some more static approach.

Hello Holger, On Tue, Jun 17, 2003 at 02:39:36PM +0200, holger krekel wrote:
I don't know but i think going the TranslationObjSpace way might turn into complicated code.
Actually, you describe a nice AnnotationObjSpace that is almost exactly what I had in mind. In other words I don't think there is much more to the AnnotationObjSpace until we can make it produce usable C code. (Btw I'm not sure emitting Pyrex-annotated code would be any different nor particularly easier.) This is the crucial example:
For example how do you "detect" a loop from an objectspace's POV? It just sees a series of a series of ObjSpace-Method accesses and it seems hard to reverse engineer the loop.
Here is the idea. You are right in that the object space cannot see that loop at all; but we don't necessarily want to produce a nice C-style 'for' loop for it -- who said we wanted to produce *nice* C code ;-) Instead, the translator keeps variable type annotations like you described for AnnotationObjSpace, but it keeps one such structure for *each* bytecode position. Hoping that this doesn't competely blow away the memory when the translator is run, we can detect when the annotation work is done by the fact that we have got back to an already-seen position and our current annotations for it at not more general than what we have already recorded. Translating the result to C is then quite straightforward. The object space's methods like space.add(), when called, would attach one or a few lines of C code to the current bytecode position. When the annotation pass completes, we just emit a C function whose body groups all these lines in the bytecode order. For jumps (like loops), we add in the body a label for each bytecode position (e.g. 'b123:' for the bytecode position 123) and then the C equivalent of 'JUMP_IF_TRUE 123' is 'if (...) goto b123;'. It doesn't make nice C loops -- looks more like assembly code. Similarily I don't imagine that Python "if" constructs will give nice "if(...) {...} else {...}" in C, but again some horrible block-less stuff with gotos. Actually getting nice C code would be another matter, but I don't really mind right now. A bientôt, Armin.

[Armin Rigo Wed, Jun 18, 2003 at 04:11:27PM +0200]
Hello Holger,
On Tue, Jun 17, 2003 at 02:39:36PM +0200, holger krekel wrote:
I don't know but i think going the TranslationObjSpace way might turn into complicated code.
Actually, you describe a nice AnnotationObjSpace that is almost exactly what I had in mind. In other words I don't think there is much more to the AnnotationObjSpace until we can make it produce usable C code. (Btw I'm not sure emitting Pyrex-annotated code would be any different nor particularly easier.)
It seems to me that a) Pyrex would allow for a smoother transition to RPython because we could run parts of our interpreter as C-modules. If we have the whole running we can migrate to our own RPython-C-base library. b) Pyrex' extended python syntax seems to be easier to generate, especially starting from annoated python-source Btw, i have no assets with Pyrex and only glanced at documentation+src without using it much.
This is the crucial example:
For example how do you "detect" a loop from an objectspace's POV? It just sees a series of a series of ObjSpace-Method accesses and it seems hard to reverse engineer the loop.
Here is the idea. You are right in that the object space cannot see that loop at all; but we don't necessarily want to produce a nice C-style 'for' loop for it -- who said we wanted to produce *nice* C code ;-)
... space-time tradeoff, i knew it :-) ...
Instead, the translator keeps variable type annotations like you described for AnnotationObjSpace, but it keeps one such structure for *each* bytecode position. Hoping that this doesn't competely blow away the memory when the translator is run, we can detect when the annotation work is done by the fact that we have got back to an already-seen position and our current annotations for it at not more general than what we have already recorded.
Translating the result to C is then quite straightforward. The object space's methods like space.add(), when called, would attach one or a few lines of C code to the current bytecode position. When the annotation pass completes, we just emit a C function whose body groups all these lines in the bytecode order.
ok, now i understand it more. I am not sure if starting from the interpreter source would be as/less/more efficient, though :-) Any guesses?
For jumps (like loops), we add in the body a label for each bytecode position (e.g. 'b123:' for the bytecode position 123) and then the C equivalent of 'JUMP_IF_TRUE 123' is 'if (...) goto b123;'. It doesn't make nice C loops -- looks more like assembly code. Similarily I don't imagine that Python "if" constructs will give nice "if(...) {...} else {...}" in C, but again some horrible block-less stuff with gotos.
sounds like psyco :-)
Actually getting nice C code would be another matter, but I don't really mind right now.
who would. It's nice that we agree about the AnnotationObjSpace so this is a starting point. Are you in Louvain-La-Neuve on Friday evening or are you staying in Bruxelles? cheers, holger
participants (3)
-
Armin Rigo
-
holger krekel
-
Samuele Pedroni