AST Transformation Hooks for Domain Specific Languages

(Oops, let's try that again with the correct destination address this time...) A few odds and ends from recent discussions finally clicked into something potentially interesting earlier this evening. Or possibly just something insane. I'm not quite decided on that point as yet (but leaning towards the latter). Anyway, without further ado, I present: AST Transformation Hooks for Domain Specific Languages ====================================================== Consider: # In some other module ast.register_dsl("dsl.sql", dsl.sql.TransformAST) # In a module using that DSL import dsl.sql def lookup_address(name : dsl.sql.char, dob : dsl.sql.date) from dsl.sql: select address from people where name = {name} and dob = {dob} Suppose that the standard AST for the latter looked something like: DSL(syntax="dsl.sql", name='lookup_address', args=arguments( args=[arg(arg='name', annotation=<Normal AST for "dsl.sql.char">), arg(arg='dob', annotation=<Normal AST for "dsl.sql.date">)], vararg=None, varargannotation=None, kwonlyargs=[], kwarg=None, kwargannotation=None, defaults=[], kw_defaults=[]), body=[Expr(value=Str(s='select address\nfrom people\nwhere name = {name} and dob = {dob}'))], decorator_list=[], returns=None) (For those not familiar with the AST, the above is actually just the existing Function node with a "syntax" attribute added) At *compile* time (note, *not* function definition time), the registered AST transformation hook would be invoked and would replace that DSL node with "standard" AST nodes. For example, depending on the design of the DSL and its support code, the above example might be equivalent to: @dsl.sql.escape_and_validate_args def lookup_address(name: dsl.sql.char, dob: dsl.sql.date): args = dict(name=name, dob=dob) query = "select address\nfrom people\nwhere name = {name} and dob = {dob}" return dsl.sql.cursor(query, args) As a simpler example, consider something like: def f() from all_nonlocal: x += 1 y -= 2 That was translated at compile time into: def f(): nonlocal x, y x += 1 y -= 2 My first pass at a rough protocol for the AST transformers suggests they would only need two methods: get_cookie() - Magic cookie to add to PYC files containing instances of the DSL (allows recompilation to be forced if the DSL is updated) transform_AST(node) - a DSL() node is passed in, expected to return an AST containing no DSL nodes (SyntaxError if one is found) Attempts to use an unregistered DSL would trigger SyntaxError So there you are, that's the crazy idea. The stoning of the heretic may now commence :) Where this idea came from was the various discussions about "make statement" style constructs and a conversation I had with Eric Snow at Pycon about function definition time really being *too late* to do anything particularly interesting that couldn't already be handled better in other ways. Some tricks Dave Malcolm had done to support Python level manipulation of the AST during compilation also played a big part, as did Eugene Toder's efforts to add an AST optimisation step to the compilation process. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Apr 8, 2011 at 9:32 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
get_cookie() - Magic cookie to add to PYC files containing instances of the DSL (allows recompilation to be forced if the DSL is updated)
An alternative might be to require that the cookie be provided when the DSL is registered. That would make cookie validity checking faster. That kind of implementation detail is in the noise though, compared to the possible implications of the overall idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 April 2011 13:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
You *really* ought to implement this as an extension module and an import hook so that we can try it out. :-) All the best, Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

One of the tricky details is where to put ast.register() so that it runs before module is parsed. Doing this in dsl.sql and importing it in 'sql' modules is not enough. Sql modules will have to rely on someone registering dsl early enough, e.g. with import dsl.sql.register # register dsl import sqlmodule # now import module using sql dsl So registering dsl will be client's responsibility, rather than something module can do for itself. If this is OK, we can achieve similar effect without any changes to Python -- for example, with import hooks. One can write a hook that applies whatever AST transformations to modules loaded from specific locations. We can make AST transformation a part of module itself, e.g. with some kind of "eager decorator". Taking your example: @@dsl.sql.query def lookup_address(name : dsl.sql.char, dob : dsl.sql.date) from dsl.sql: select address from people where name = {name} and dob = {dob} Eager decorator has to be used by the fully qualified name. Parser will import (and execute) defining module (dsl.sql in this example) while compiling a module that uses it (not when module is executed, as with normal decorator). Eugene

On Fri, Apr 8, 2011 at 11:15 PM, Eugene Toder <eltoder@gmail.com> wrote:
So registering dsl will be client's responsibility, rather than something module can do for itself.
Yep, but if you're at the point of using a DSL, that's likely to be part of a larger framework which can take care of registering the DSL for you before you import any modules that need it. That answer applies whether this is a standard language feature or part of an import hook that uses its own custom compiler.
Yeah, the big downside is having to almost completely reimplement the AST compiler in order to do it that way (since the built-in one would choke on the syntax extension). That isn't *hard* so much as it is tedious (especially compared to tweaking the existing one in place on a clone of the main repo). Once the AST has been transformed, of course, the existing compiler could still be used. Note something I haven't looked into yet is whether or not the CPython parser can even generate a different node type for this, or manage the "automatic stringification" of the DSL body. The former issue wouldn't be too hard to handle (just add a "syntax" attribute to the existing Function node, with "node.syntax is None" indicating standard Python code, but I'm not sure about the second one (although worst case would be to require use of the docstring to cover anything that didn't fit with standard Python syntax - arguably not a bad idea anyway, since it would be a lot friendlier to non-DSL aware Python tools).
But what would the eager decorator buy you over just specifying a different dialect in the "from" clause? I'm not sure I'll ever actually create a prototype of this (lots of other things on the to-do list), but I found the idea too intriguing not to share it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Not necessarily. And registering makes debugging small pieces and playing in interactive interpreter less convenient.
I did not realize you're proposing to skip parsing of the body of dsl function and just tuck all the code into Str literal. I though the idea was to actually do an AST transformation, i.e. use Python parser first and rewrite resulting AST into something else. This is a classical approach and something people already do in Python, though it's not integrated with the language at the moment (e.g. there's no quote and no way to get AST for function). If the idea is to use completely custom syntax in dsl function and implement custom parsers for dsls, it's harder to do with import hook, but not too hard. A preprocessor step that puts body of dsl functions into """ will do. I don't know if I like the stringification idea. It allows much more freedom in dsl syntax, but to use that freedom one needs to implement a parser. Just rewriting AST is simpler. Also, the implementation is very different from an AST transform -- basically, the grammar needs to say that after you saw dsl function header the next suite needs to be saved as a string.
But what would the eager decorator buy you over just specifying a different dialect in the "from" clause?
The point of eager decorator is to avoid the need for registration -- code becomes self-sufficient. It doesn't need "from" clause, I forgot to delete it. To reiterate, the idea is that the code in form (straw-man syntax) @@a.dec(args) def foo(): ... in module b will import module a at compile time and execute a.dec on foo's AST and resulting AST will be used. This can be done with a 'from' clause as well, if it gains importing capabilities -- that's just a syntactic difference. However, this seems very close to existing decorators, so similar syntax and semantics seem to make sense. Eugene

I did not realize you're proposing to skip parsing of the body of dsl function and just tuck all the code into Str literal.
If I can ask a noobish question, in that case, what's the advantage of the new syntax over the currently existing ability to have a decorator parse a docstring? Eg., an SQL DSL could be done today as import dsl.sql @dsl.sql def lookup_address(name : dsl.sql.char, dob : dsl.sql.date): """select address from people where name = {name} and dob = {dob}""" pass If you can't bear to lose the docstring, you could instead write something like, import dsl.sql @dsl.sql def lookup_address(name : dsl.sql.char, dob : dsl.sql.date): "Looks up the address of a name and DOB." return """select address from people where name = {name} and dob = {dob}""" What's the advantage of the "def from" over this? Bear in mind that this way of doing has the advantage of working with today's Python syntax highlighters. ;-)

On Sat, Apr 9, 2011 at 4:06 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Yep - it's the move to compile time metaprogramming that makes ideas like this one and tools like Mython interesting. Everything else is too close to what can already be done with decorators and metaclasses to justify the additional complexity. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 4/9/2011 2:06 AM, Eric Snow wrote:
My understanding is that the AST transformation is done at compile-time, while the decorator happens at run-time.
Nope. Decorator call happens just after compilation, but before calls. A decorator could compile a new code object and replace the original. If the decorator return a wrapper instead of the original function, then, of course, the *wrapper* gets called with each call. Or a decorator can return a replacement that does not call the original function. -- Terry Jan Reedy

On 9 Apr 2011, at 22:10, Terry Reedy wrote:
I don't understand what you are saying. A decorator call occurs just after the execution of the def statement it decorates, which is definitely at run-time. And that's completely unrelated to compilation time. In fact most module are executed many times between compilations. And yes, a decorator can compile a new code object, but how is it relevant? -- Arnaud

On Sun, Apr 10, 2011 at 8:13 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I was thinking Eric meant function call time, but that is probably wrong, so ignore my comment.
Yeah, he was talking about function definition time rather than call time. From the point-of-view of PYC file generation, both function definition time and function call time are "runtime" events that happen after the PYC file has been generated. The typical flow of events is... - module compilation (including PYC caching) - module execution (including definition of top-level functions and classes) - function execution (including definition of any nested functions and classes) The latter two are both "runtime", but functions are special because key events happens at all three stages in the chain: - At compilation time, the compiler symbol table analysis and code generation figures out all the variable scoping and decides whether to emit local, nonlocal or global operations for variable access, and which variables need to be stored in cells for correct access from closures - At function definition time, default arguments, annotations and decorators are all evaluated, any cell references are linked up to the relevant outer scopes and the function name is bound in the containing scope - At function call time, the arguments are populated and the actual function body is executed Generators add a fourth stage into the process, since they separate generator construction time from iteration time (and don't check their arguments until they start iterating). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Apr 9, 2011 at 10:27 AM, Eugene Toder <eltoder@gmail.com> wrote:
Not really. All you would need is for it to be standard practice for DSL definition modules to provide a __main__ clause that registers themselves and uses runpy to execute the passed in arguments (once we start converting some of the stdlib utilities like "pdb" to do that, I'll even see if there is sufficiently commonality to make it worth factoring out a "runpy.delegate_main()" helper function). So a simple "python -mdsl.sql myfile.py" would run a file that uses the DSL, while "python -i -mdsl.sql" would get you an interactive interpreter that understood that DSL dialect. Embedding imports inside functions has long been fragile, and is a good recipe for deadlocking code (especially if process forks are involved). I *really* don't want to advocate enshrining them as an implicit part of a standard piece of syntax (even if that syntax has no realistic chance of ever making it into the core language).
That part is actually orthogonal to the core AST construction hook concept. It just occurred to me as a possibility when I was writing the SQL example. Certainly, if I try implementing this, I won't do it that way - I'll just add the "syntax" attribute to the Function node and do the experiment as a true AST transformation process rather than implicitly quoting the function body. That will make the AST generation tweaks significantly simpler than the full-blown Mython-style quoting concept.
Reusing Python syntax would still be easy - you'd simply invoke ast.parse() on the stringified body. However, I'm not that fond of the idea either, so I doubt I'll experiment with it (particularly since Mython already works that way).
It also has the virtue of getting the DSL name in a more prominent place, and allowing easier composition of DSLs. I'm sold :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Apr 10, 2011 at 3:00 AM, Eugene Toder <eltoder@gmail.com> wrote:
The difference is actually pretty huge. If it is done at compile time: 1. You consume the entire module at once, permitting non-local effects. This is significant, as it allows you to reference variables in outer scopes, converting them into cell references (e.g. see the "all_nonlocal" example from my original post). Runtime is far too late to do that, since the compiler has already finished the symbol table analysis and code generation that handles nested scopes. 2. Compile time operations can have their results cached in the generated PYC files. This cannot happen with runtime operations. If this was handled as a runtime operation, you couldn't really do anything that can't already be done with decorators and metaclasses. People sometimes get confused about how Python's compilation and execution model differs from that of a static language like C. To give a quick rundown of the different major phases: C build-and-execution model: - compile time (.c -> .o) - link time (multiple .o -> executable) - dynamic linking (loading additional modules at runtime) - runtime (actual code execution) Only the last 2 when the program is executed Python - compile time (.py -> bytecode) - definition time (only significant for functions and classes - time when the "def" or "class" statement is executed) - runtime (actual code execution, guaranteed to be after definition time for code inside a function body and during definition time for a class body) Since compilation is implicit, and there is no pre-linking step, all of these steps happen when the program is executed (although the first step can optionally be performed in advance). It's the separation of compile and definition time that is the major difference between a scripting language like Python and a more traditional language like C. In a traditional language, the "top-level" code is handled entirely by the compiler, and never actually touched at runtime, so you can't do things like use loops or conditional logic or exception handling to affect how your program is defined (and if you can, the syntax will typically be completely different from the "normal" syntax of the language). In a scripting language, top-level code has access to all the same constructs as code inside functions (and, typically, vice-versa - hence first class functions and type definitions). Currently Python lets you do lots of things at runtime (i.e. most code) and at definition time (decorators, metaclasses, default arguments, annotations). There are, however, no compile time hooks other than creating your own import hook as Mython does, and completely taking over the compilation process.
Yes, but it's the only way to make this work as a compile-time operation (since compilation is completed before module execution starts). If it's runtime only, then there's no point in doing it at all. Decorators and metaclasses have that space well and truly covered.
Python-AST is a reasonable place to start though, since non-Python syntax can easily be written inside a docstring. It also creates a subtle social pressure in favour of staying within the spirit of Python syntax and semantics, and clearly demarcating (via triple-quoted strings) when you're straying away from that. An SQL DSL, for example, would most likely go the route of triple-quoting the entire SQL statement, but could also do something more novel like using assignments to define SQL clauses: select = address tables = people where = name == {name} and dob == {dob} Such is the power and danger of compile-time metaprogramming :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 April 2011 12:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
Oops, sent my reply to the wrong list as well. The essence of the proposal is to allow arbitrary syntax within "standard python files". I don't think it stands much of a chance in core. It would be an awesome tool for experimenting with new syntax and DSLs though. :-) Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On Fri, Apr 8, 2011 at 9:32 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
get_cookie() - Magic cookie to add to PYC files containing instances of the DSL (allows recompilation to be forced if the DSL is updated)
An alternative might be to require that the cookie be provided when the DSL is registered. That would make cookie validity checking faster. That kind of implementation detail is in the noise though, compared to the possible implications of the overall idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 April 2011 13:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
You *really* ought to implement this as an extension module and an import hook so that we can try it out. :-) All the best, Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

One of the tricky details is where to put ast.register() so that it runs before module is parsed. Doing this in dsl.sql and importing it in 'sql' modules is not enough. Sql modules will have to rely on someone registering dsl early enough, e.g. with import dsl.sql.register # register dsl import sqlmodule # now import module using sql dsl So registering dsl will be client's responsibility, rather than something module can do for itself. If this is OK, we can achieve similar effect without any changes to Python -- for example, with import hooks. One can write a hook that applies whatever AST transformations to modules loaded from specific locations. We can make AST transformation a part of module itself, e.g. with some kind of "eager decorator". Taking your example: @@dsl.sql.query def lookup_address(name : dsl.sql.char, dob : dsl.sql.date) from dsl.sql: select address from people where name = {name} and dob = {dob} Eager decorator has to be used by the fully qualified name. Parser will import (and execute) defining module (dsl.sql in this example) while compiling a module that uses it (not when module is executed, as with normal decorator). Eugene

On Fri, Apr 8, 2011 at 11:15 PM, Eugene Toder <eltoder@gmail.com> wrote:
So registering dsl will be client's responsibility, rather than something module can do for itself.
Yep, but if you're at the point of using a DSL, that's likely to be part of a larger framework which can take care of registering the DSL for you before you import any modules that need it. That answer applies whether this is a standard language feature or part of an import hook that uses its own custom compiler.
Yeah, the big downside is having to almost completely reimplement the AST compiler in order to do it that way (since the built-in one would choke on the syntax extension). That isn't *hard* so much as it is tedious (especially compared to tweaking the existing one in place on a clone of the main repo). Once the AST has been transformed, of course, the existing compiler could still be used. Note something I haven't looked into yet is whether or not the CPython parser can even generate a different node type for this, or manage the "automatic stringification" of the DSL body. The former issue wouldn't be too hard to handle (just add a "syntax" attribute to the existing Function node, with "node.syntax is None" indicating standard Python code, but I'm not sure about the second one (although worst case would be to require use of the docstring to cover anything that didn't fit with standard Python syntax - arguably not a bad idea anyway, since it would be a lot friendlier to non-DSL aware Python tools).
But what would the eager decorator buy you over just specifying a different dialect in the "from" clause? I'm not sure I'll ever actually create a prototype of this (lots of other things on the to-do list), but I found the idea too intriguing not to share it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Not necessarily. And registering makes debugging small pieces and playing in interactive interpreter less convenient.
I did not realize you're proposing to skip parsing of the body of dsl function and just tuck all the code into Str literal. I though the idea was to actually do an AST transformation, i.e. use Python parser first and rewrite resulting AST into something else. This is a classical approach and something people already do in Python, though it's not integrated with the language at the moment (e.g. there's no quote and no way to get AST for function). If the idea is to use completely custom syntax in dsl function and implement custom parsers for dsls, it's harder to do with import hook, but not too hard. A preprocessor step that puts body of dsl functions into """ will do. I don't know if I like the stringification idea. It allows much more freedom in dsl syntax, but to use that freedom one needs to implement a parser. Just rewriting AST is simpler. Also, the implementation is very different from an AST transform -- basically, the grammar needs to say that after you saw dsl function header the next suite needs to be saved as a string.
But what would the eager decorator buy you over just specifying a different dialect in the "from" clause?
The point of eager decorator is to avoid the need for registration -- code becomes self-sufficient. It doesn't need "from" clause, I forgot to delete it. To reiterate, the idea is that the code in form (straw-man syntax) @@a.dec(args) def foo(): ... in module b will import module a at compile time and execute a.dec on foo's AST and resulting AST will be used. This can be done with a 'from' clause as well, if it gains importing capabilities -- that's just a syntactic difference. However, this seems very close to existing decorators, so similar syntax and semantics seem to make sense. Eugene

I did not realize you're proposing to skip parsing of the body of dsl function and just tuck all the code into Str literal.
If I can ask a noobish question, in that case, what's the advantage of the new syntax over the currently existing ability to have a decorator parse a docstring? Eg., an SQL DSL could be done today as import dsl.sql @dsl.sql def lookup_address(name : dsl.sql.char, dob : dsl.sql.date): """select address from people where name = {name} and dob = {dob}""" pass If you can't bear to lose the docstring, you could instead write something like, import dsl.sql @dsl.sql def lookup_address(name : dsl.sql.char, dob : dsl.sql.date): "Looks up the address of a name and DOB." return """select address from people where name = {name} and dob = {dob}""" What's the advantage of the "def from" over this? Bear in mind that this way of doing has the advantage of working with today's Python syntax highlighters. ;-)

On Sat, Apr 9, 2011 at 4:06 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Yep - it's the move to compile time metaprogramming that makes ideas like this one and tools like Mython interesting. Everything else is too close to what can already be done with decorators and metaclasses to justify the additional complexity. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 4/9/2011 2:06 AM, Eric Snow wrote:
My understanding is that the AST transformation is done at compile-time, while the decorator happens at run-time.
Nope. Decorator call happens just after compilation, but before calls. A decorator could compile a new code object and replace the original. If the decorator return a wrapper instead of the original function, then, of course, the *wrapper* gets called with each call. Or a decorator can return a replacement that does not call the original function. -- Terry Jan Reedy

On 9 Apr 2011, at 22:10, Terry Reedy wrote:
I don't understand what you are saying. A decorator call occurs just after the execution of the def statement it decorates, which is definitely at run-time. And that's completely unrelated to compilation time. In fact most module are executed many times between compilations. And yes, a decorator can compile a new code object, but how is it relevant? -- Arnaud

On Sun, Apr 10, 2011 at 8:13 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I was thinking Eric meant function call time, but that is probably wrong, so ignore my comment.
Yeah, he was talking about function definition time rather than call time. From the point-of-view of PYC file generation, both function definition time and function call time are "runtime" events that happen after the PYC file has been generated. The typical flow of events is... - module compilation (including PYC caching) - module execution (including definition of top-level functions and classes) - function execution (including definition of any nested functions and classes) The latter two are both "runtime", but functions are special because key events happens at all three stages in the chain: - At compilation time, the compiler symbol table analysis and code generation figures out all the variable scoping and decides whether to emit local, nonlocal or global operations for variable access, and which variables need to be stored in cells for correct access from closures - At function definition time, default arguments, annotations and decorators are all evaluated, any cell references are linked up to the relevant outer scopes and the function name is bound in the containing scope - At function call time, the arguments are populated and the actual function body is executed Generators add a fourth stage into the process, since they separate generator construction time from iteration time (and don't check their arguments until they start iterating). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Apr 9, 2011 at 10:27 AM, Eugene Toder <eltoder@gmail.com> wrote:
Not really. All you would need is for it to be standard practice for DSL definition modules to provide a __main__ clause that registers themselves and uses runpy to execute the passed in arguments (once we start converting some of the stdlib utilities like "pdb" to do that, I'll even see if there is sufficiently commonality to make it worth factoring out a "runpy.delegate_main()" helper function). So a simple "python -mdsl.sql myfile.py" would run a file that uses the DSL, while "python -i -mdsl.sql" would get you an interactive interpreter that understood that DSL dialect. Embedding imports inside functions has long been fragile, and is a good recipe for deadlocking code (especially if process forks are involved). I *really* don't want to advocate enshrining them as an implicit part of a standard piece of syntax (even if that syntax has no realistic chance of ever making it into the core language).
That part is actually orthogonal to the core AST construction hook concept. It just occurred to me as a possibility when I was writing the SQL example. Certainly, if I try implementing this, I won't do it that way - I'll just add the "syntax" attribute to the Function node and do the experiment as a true AST transformation process rather than implicitly quoting the function body. That will make the AST generation tweaks significantly simpler than the full-blown Mython-style quoting concept.
Reusing Python syntax would still be easy - you'd simply invoke ast.parse() on the stringified body. However, I'm not that fond of the idea either, so I doubt I'll experiment with it (particularly since Mython already works that way).
It also has the virtue of getting the DSL name in a more prominent place, and allowing easier composition of DSLs. I'm sold :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Apr 10, 2011 at 3:00 AM, Eugene Toder <eltoder@gmail.com> wrote:
The difference is actually pretty huge. If it is done at compile time: 1. You consume the entire module at once, permitting non-local effects. This is significant, as it allows you to reference variables in outer scopes, converting them into cell references (e.g. see the "all_nonlocal" example from my original post). Runtime is far too late to do that, since the compiler has already finished the symbol table analysis and code generation that handles nested scopes. 2. Compile time operations can have their results cached in the generated PYC files. This cannot happen with runtime operations. If this was handled as a runtime operation, you couldn't really do anything that can't already be done with decorators and metaclasses. People sometimes get confused about how Python's compilation and execution model differs from that of a static language like C. To give a quick rundown of the different major phases: C build-and-execution model: - compile time (.c -> .o) - link time (multiple .o -> executable) - dynamic linking (loading additional modules at runtime) - runtime (actual code execution) Only the last 2 when the program is executed Python - compile time (.py -> bytecode) - definition time (only significant for functions and classes - time when the "def" or "class" statement is executed) - runtime (actual code execution, guaranteed to be after definition time for code inside a function body and during definition time for a class body) Since compilation is implicit, and there is no pre-linking step, all of these steps happen when the program is executed (although the first step can optionally be performed in advance). It's the separation of compile and definition time that is the major difference between a scripting language like Python and a more traditional language like C. In a traditional language, the "top-level" code is handled entirely by the compiler, and never actually touched at runtime, so you can't do things like use loops or conditional logic or exception handling to affect how your program is defined (and if you can, the syntax will typically be completely different from the "normal" syntax of the language). In a scripting language, top-level code has access to all the same constructs as code inside functions (and, typically, vice-versa - hence first class functions and type definitions). Currently Python lets you do lots of things at runtime (i.e. most code) and at definition time (decorators, metaclasses, default arguments, annotations). There are, however, no compile time hooks other than creating your own import hook as Mython does, and completely taking over the compilation process.
Yes, but it's the only way to make this work as a compile-time operation (since compilation is completed before module execution starts). If it's runtime only, then there's no point in doing it at all. Decorators and metaclasses have that space well and truly covered.
Python-AST is a reasonable place to start though, since non-Python syntax can easily be written inside a docstring. It also creates a subtle social pressure in favour of staying within the spirit of Python syntax and semantics, and clearly demarcating (via triple-quoted strings) when you're straying away from that. An SQL DSL, for example, would most likely go the route of triple-quoting the entire SQL statement, but could also do something more novel like using assignments to define SQL clauses: select = address tables = people where = name == {name} and dob == {dob} Such is the power and danger of compile-time metaprogramming :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 April 2011 12:32, Nick Coghlan <ncoghlan@gmail.com> wrote:
Oops, sent my reply to the wrong list as well. The essence of the proposal is to allow arbitrary syntax within "standard python files". I don't think it stands much of a chance in core. It would be an awesome tool for experimenting with new syntax and DSLs though. :-) Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
participants (7)
-
Arnaud Delobelle
-
Carl M. Johnson
-
Eric Snow
-
Eugene Toder
-
Michael Foord
-
Nick Coghlan
-
Terry Reedy