
Hi all, I have a Python parser done in "pure" Python (no C extension module dependencies, etc...). I even have an implementation of pgen in Python. Now, I am wondering what the next step is. Shall I continue onward to bytecode compilation? Eventually, I am going to need some instruction about integrating my code in with pypy. I would be more than happy to press onward and attempt to reimplement a static Python to C translator, and I am also interested in more blue sky stuff like adding optional static typing or playing with type inference stuff. Let me know what ya'll think. Thanks! -Jon

[Jeremy Hylton Wed, Jul 30, 2003 at 06:09:41PM -0400]
note, that we mirror the Python-CVS trunk into our svn-repository. This makes it easy to have a compiler-package within pypy and merge back and fro with CPython's compiler package. However, i am not sure how many design goals a pypy-compiler and cpython-compiler will share in the longer run. E.g. we probably want the pypy-compiler to be configurable at run time, e.g. from pypy import compiler compiler.set_minimal() # only compile very basic constructs compiler.allow_listcomprehension() compiler.allow_classes() # set a different way to compile try/except/finally constructs [*] compiler.set_exception_compiler(MyExcCompiler) compiler.compile(testsource) # ... and CPython's compiler package probably doesn't have such design goals. But then again, it wouldn't hurt if CPython's compiler package could do such things :-) cheers, holger [*] this e.g. means that a code object effectively determines opcode (and possibly frame) implementations. Probably 'MyExcCompiler' could just provide the implementations for its "own" opcodes and the machinery would make them available at interpretation-time.

On Thu, 31 Jul 2003, holger krekel wrote:
Sure thing. Let me know what you'd like done. Perhaps we can fire up the compiler SIG one more time? (*grin*)
Heh, this would make three repositories for my code. I already have it integrated into my Basil CVS tree. :) I guess the more the merrier!
This is cool stuff Holger. Could you give me pointers as to where I should head with this stuff? I've had pretty firm ideas in the past, but I'm not sure how they'd be received. For example, using the parse tree of the input grammar, I either have or could easily create a tool that would build a base tree walker class and then developers could derive custom behavior from there. Perhaps the compiler would not be stateful (in your example you use several method/function calls to set compiler state) so much as a module that exposes multiple tree walker classes. A lot of this technology is key to the Basil project as I was going to build a control flow and data flow analysis toolset for C/C++/Python. FYI, as I stated above, the code (or most of it, anyway) is the Basil SF CVS repository (basil.sf.net) if anyone wants a sneak peak (see the basil.lang.python & basil.parsing packages.) The actual Python parser is still hidden in a test of the PyPgen stuff and is built at run time, but I should get around to pretty printing it and packaging it soon. Thanks! -Jon

Hi Jonathan, [Jonathan David Riehl Wed, Jul 30, 2003 at 06:20:23PM -0500]
For starters, just mail me your prefered username and password and i set up an account and help you through occuring problems. I have repeatedly merged an experimental CPython-fork from Python-2.2 to Python-2.3 from our svn-CPython-mirror without any problems.
After reading about your GRAD/Path and Basil i think that you know much more about compiler & parser technology than me :-) Nevertheless, here are my *wishes* to a *compiler* package as i currently see it fit for PyPy. - separate "aspects" of compiling the python language out into classes so that they can be easily subclassed/replaced. - completly dynamic approach to bytecodes. Each "aspect" adds/references bytecodes including their implementation. Actually the aspect classes should generate the bytecode as a sequence of arbitrary Bytecode instances, i guess. - the list of bytecode instances might be converted into a Std-CPython code object or into an enriched PyPy-one (or into Jython/Java-code?) - i wonder if the enriched PyPy code objects could just be a pickle of of the 'internal compiler result' so that the PyPy-interpreter at runtime calls the 'code.create_frame' function passing in an ObjSpace and wrapped args/kwargs. and then the frame.eval-method execute the code object. This model might also fit with gatewaying to C or other languages. - there seems to be a chicken-and-egg problem with the implementation of a Bytecode instance 'B': it needs to be compiled, too. The closure of all Bytecode instances needed for this specific 'B' implementation should obviously not contain itself. Maybe bytecode implementations should be compiled to a different (restricted) set of Bytecodes alltogether? I haven't thought about how parser & Grammars kick into this picture but ...
... this sounds like an excellent starting point. From an experimentator's/implementor's POV it would probably be nice to put all aspects of providing a grammar providing a compile for the constructs in the grammar providing the implementations of the compiled bytecodes into one simple unit you can pass to the parser/compiler-package. I am unsure how decomposable a grammar like CPython's is.
Sure, i didn't mean to suggest an actual API but just the functionality.
A lot of this technology is key to the Basil project as I was going to build a control flow and data flow analysis toolset for C/C++/Python.
Here Armin or others need to chime in. The whole abstract interpretation/annoated object space thing is still somewhat opaque to me.
"The SourceForge.net Website is currently down for maintenance." cheers, holger P.S.: If i used (PyPy) terminology you don't know or deem dubious please feel free to ask/correct me. So far we have carried pretty ok through all the vagueness but questioning and clarifying concepts can't hurt.

Hello, On Fri, Aug 01, 2003 at 01:03:50PM +0200, holger krekel wrote:
- there seems to be a chicken-and-egg problem with the implementation of a Bytecode instance 'B': it needs to be compiled, too.
I think we should not worry about this problem. Currently we need CPython for bootstrapping. When we can create (using CPython) a stand-alone PyPy that can interpret regular Python code, then we can simply switch from CPython to that PyPy to bootstrap the next versions of PyPy.
There is certainly room for both approaches to code analysis: the syntax one (a parse tree walker is cool), and the semantic one, which is the purpose of the annotated object space (abstract interpretation). The two approaches complete each other. The first one is suited to high-level analysis of the code, close to the source; we can reason about the constructs used in the source. It is good to check global properties of the code, or to do control flow analysis. The second one is closer to the actual execution, and I'd say it is more suited for things like type inference or data flow analysis because we follow step by step what the interpreter would do with real data, at a low level. In other words, abstract interpretation is excellent for questions like "what would the interpreter do at run-time if...". It cannot answer questions about the source. Our "annotation object space" tries to do type inference, and I thought we could then use another object space to translate the code to low-level C code using the information previously inferred. Now that I think about it it may be even nicer to combine the semantic analysis of the type inference with a *syntactic* translator to keep the higher-level "feeling" of the source code in C; not that it is really needed in our particular case, but I guess some would like it (e.g. for a Pyrex-like project). A bientôt, Armin.

My original goal for the compiler package was to make it possible to experiment with variants and extensions to the core language. The best place to start was a compiler for the current language, and I haven't had much time to pursue it beyond that. But it is exactly in line with the long-term goals for the compiler package. I can help a little. I'll have more spare time now that the 2.3 release is done, but I also want to finish off Python's ast branch. It would be nice if the new py-parser could generate that AST with less effort than the current transformer. Jeremy

[Jeremy Hylton Wed, Jul 30, 2003 at 06:09:41PM -0400]
note, that we mirror the Python-CVS trunk into our svn-repository. This makes it easy to have a compiler-package within pypy and merge back and fro with CPython's compiler package. However, i am not sure how many design goals a pypy-compiler and cpython-compiler will share in the longer run. E.g. we probably want the pypy-compiler to be configurable at run time, e.g. from pypy import compiler compiler.set_minimal() # only compile very basic constructs compiler.allow_listcomprehension() compiler.allow_classes() # set a different way to compile try/except/finally constructs [*] compiler.set_exception_compiler(MyExcCompiler) compiler.compile(testsource) # ... and CPython's compiler package probably doesn't have such design goals. But then again, it wouldn't hurt if CPython's compiler package could do such things :-) cheers, holger [*] this e.g. means that a code object effectively determines opcode (and possibly frame) implementations. Probably 'MyExcCompiler' could just provide the implementations for its "own" opcodes and the machinery would make them available at interpretation-time.

On Thu, 31 Jul 2003, holger krekel wrote:
Sure thing. Let me know what you'd like done. Perhaps we can fire up the compiler SIG one more time? (*grin*)
Heh, this would make three repositories for my code. I already have it integrated into my Basil CVS tree. :) I guess the more the merrier!
This is cool stuff Holger. Could you give me pointers as to where I should head with this stuff? I've had pretty firm ideas in the past, but I'm not sure how they'd be received. For example, using the parse tree of the input grammar, I either have or could easily create a tool that would build a base tree walker class and then developers could derive custom behavior from there. Perhaps the compiler would not be stateful (in your example you use several method/function calls to set compiler state) so much as a module that exposes multiple tree walker classes. A lot of this technology is key to the Basil project as I was going to build a control flow and data flow analysis toolset for C/C++/Python. FYI, as I stated above, the code (or most of it, anyway) is the Basil SF CVS repository (basil.sf.net) if anyone wants a sneak peak (see the basil.lang.python & basil.parsing packages.) The actual Python parser is still hidden in a test of the PyPgen stuff and is built at run time, but I should get around to pretty printing it and packaging it soon. Thanks! -Jon

Hi Jonathan, [Jonathan David Riehl Wed, Jul 30, 2003 at 06:20:23PM -0500]
For starters, just mail me your prefered username and password and i set up an account and help you through occuring problems. I have repeatedly merged an experimental CPython-fork from Python-2.2 to Python-2.3 from our svn-CPython-mirror without any problems.
After reading about your GRAD/Path and Basil i think that you know much more about compiler & parser technology than me :-) Nevertheless, here are my *wishes* to a *compiler* package as i currently see it fit for PyPy. - separate "aspects" of compiling the python language out into classes so that they can be easily subclassed/replaced. - completly dynamic approach to bytecodes. Each "aspect" adds/references bytecodes including their implementation. Actually the aspect classes should generate the bytecode as a sequence of arbitrary Bytecode instances, i guess. - the list of bytecode instances might be converted into a Std-CPython code object or into an enriched PyPy-one (or into Jython/Java-code?) - i wonder if the enriched PyPy code objects could just be a pickle of of the 'internal compiler result' so that the PyPy-interpreter at runtime calls the 'code.create_frame' function passing in an ObjSpace and wrapped args/kwargs. and then the frame.eval-method execute the code object. This model might also fit with gatewaying to C or other languages. - there seems to be a chicken-and-egg problem with the implementation of a Bytecode instance 'B': it needs to be compiled, too. The closure of all Bytecode instances needed for this specific 'B' implementation should obviously not contain itself. Maybe bytecode implementations should be compiled to a different (restricted) set of Bytecodes alltogether? I haven't thought about how parser & Grammars kick into this picture but ...
... this sounds like an excellent starting point. From an experimentator's/implementor's POV it would probably be nice to put all aspects of providing a grammar providing a compile for the constructs in the grammar providing the implementations of the compiled bytecodes into one simple unit you can pass to the parser/compiler-package. I am unsure how decomposable a grammar like CPython's is.
Sure, i didn't mean to suggest an actual API but just the functionality.
A lot of this technology is key to the Basil project as I was going to build a control flow and data flow analysis toolset for C/C++/Python.
Here Armin or others need to chime in. The whole abstract interpretation/annoated object space thing is still somewhat opaque to me.
"The SourceForge.net Website is currently down for maintenance." cheers, holger P.S.: If i used (PyPy) terminology you don't know or deem dubious please feel free to ask/correct me. So far we have carried pretty ok through all the vagueness but questioning and clarifying concepts can't hurt.

Hello, On Fri, Aug 01, 2003 at 01:03:50PM +0200, holger krekel wrote:
- there seems to be a chicken-and-egg problem with the implementation of a Bytecode instance 'B': it needs to be compiled, too.
I think we should not worry about this problem. Currently we need CPython for bootstrapping. When we can create (using CPython) a stand-alone PyPy that can interpret regular Python code, then we can simply switch from CPython to that PyPy to bootstrap the next versions of PyPy.
There is certainly room for both approaches to code analysis: the syntax one (a parse tree walker is cool), and the semantic one, which is the purpose of the annotated object space (abstract interpretation). The two approaches complete each other. The first one is suited to high-level analysis of the code, close to the source; we can reason about the constructs used in the source. It is good to check global properties of the code, or to do control flow analysis. The second one is closer to the actual execution, and I'd say it is more suited for things like type inference or data flow analysis because we follow step by step what the interpreter would do with real data, at a low level. In other words, abstract interpretation is excellent for questions like "what would the interpreter do at run-time if...". It cannot answer questions about the source. Our "annotation object space" tries to do type inference, and I thought we could then use another object space to translate the code to low-level C code using the information previously inferred. Now that I think about it it may be even nicer to combine the semantic analysis of the type inference with a *syntactic* translator to keep the higher-level "feeling" of the source code in C; not that it is really needed in our particular case, but I guess some would like it (e.g. for a Pyrex-like project). A bientôt, Armin.

My original goal for the compiler package was to make it possible to experiment with variants and extensions to the core language. The best place to start was a compiler for the current language, and I haven't had much time to pursue it beyond that. But it is exactly in line with the long-term goals for the compiler package. I can help a little. I'll have more spare time now that the 2.3 release is done, but I also want to finish off Python's ast branch. It would be nice if the new py-parser could generate that AST with less effort than the current transformer. Jeremy
participants (4)
-
Armin Rigo
-
holger krekel
-
Jeremy Hylton
-
Jonathan David Riehl