Idea: The DSL operator.
![](https://secure.gravatar.com/avatar/c3eddec3d2fbdeb60b81f8fcfb6c5d23.jpg?s=120&d=mm&r=g)
This is in response to the various suggestions for embedding SQL syntax in Python and various other suggestions made over the years. Note that this suggestion is only a starting point for discussion, I don't have any illusion that the idea as presented here would have strong support. I'm fully aware that the actual proposal given below looks "butt-ugly", but maybe someone can invent something better. Background: Python is frequently used as a foundation for Domain Specific Languages (DSLs). Examples include SQLObject, SCons, and many others. These are all examples of a mini-language which is constructed on top of the Python syntax, using operator overloading. The DSL is often embedded within Python code (as in the case of SQLObject), and can be invoked programmatically from regular (non-DSL) Python code. However, there are some limitations on what can be expressed. For example, in current interpreters, it is not possible to override the 'and' and 'or' operator. While there currently is a PEP that proposes adding this capability, there is some disagreement as to how much this would impact the efficiency of the non-override case. The reason is because in the non-override case, the compiler is able to take advantage of the "shortcut" properties of these operators; If the operators can be overridden (which is a post-compiler operation), then the compiler would no longer be able to make these optimization assumptions. A similar limitation in DSLs is the treatment of unbound identifiers. Identifiers which have already been bound to objects are not a problem, since you can simply override the various operators of those objects. However, for identifiers which have not yet been assigned to a variable, a number of ugly hacks are required. Typically what is done is to have a set of "special" objects which act as placeholders for unbound variable names, for example _x, _y, _z, and so on. Unfortunately, this requires the person writing the DSL expression to pre-declare unbound variables, which is exactly the opposite of how regular Python works. Another method that is sometimes used is to declare unbound variables as attributes of some general "unbound variable" object, so you can say something like Query.x, Query.y, and so forth. But again, this is rather wordy and robs the DSL of its ability to concisely express a particular semantic intent, which is the whole point of DSLs in the first place. Another limitation of Python's ability to represent DSLs is that you cannot override the assignment operator. My idea is to alleviate these problems by giving DSLs some help at the parsing level. By informing the parser that a given expression is a DSL, we can relax some of the normal assumptions of the Python syntax, and instead require the DSL interpreter to handle them, which allows the DSL to have greater flexibility. Two things we want to avoid: First, we want to avoid "programmable syntax", meaning that we cannot define DSL-specific semantics at parse time. The reason for this is that DSLs are generally defined in their own specific imported modules, and the Python parser has no knowledge of imports, and therefore has no access to the DSL's syntax rules. Secondly, we want to avoid conflicts between multiple DSLs. So for example, if I am using two different DSLs in the same module, we can't have two different meanings for "and", for example. The solution to both of these concerns is to say that the parser's support for DSLs is completely generic - in other words, we can tell the parser that a given expression is a DSL, but we can tell it nothing specific about that *particular* DSL. Instead, it is up to the DSL interpreter - which operates at runtime, not parse time - to take up where the parser left off. This is similar to PJE's suggestion of a way to "quote" Python syntax, returning an AST. In such a scheme, the Python parser doesn't completely compile the quoted expression to Python byte code, but instead produces an AST (or AST-like data structure which might not exactly be an AST), which is then interpreted by the DSL. So far this all seems reasonable enough to my ear :) Next is the part where I go mad. (As my friend Sara said to me recently, "I'm not a mad scientist - I'm a mad natural philosopher.") For purposes of dicussion I am going to introduce the token '??' (two consecutive question marks) as the "DSL operator", in other words a signal to the parser that a given expression is a DSL. I'm sure that there could be a long and intense bikeshed discussion debating the specific operator syntax, but for now I'll choose ?? since it's unambiguously parsable. The '??' operator can be used in one of three ways. First, you can declare an entire expression as a DSL: a = ??(c + d) The parser will take all of the tokens inside the parenthesized group and return them as an AST, it will not attempt to generate code to evaluate any of the operators within the group. There's an extra bit of syntactic sugar: If the DSL expression is the only argument to a function, you can combine the function call and the DSL quote: func??(c+d) Meaning "pass the AST for 'c + d' to the function 'func'". For most DSLs, the 'func' will actually be the name of the DSL, so the typical use pattern will be 'name_of_dsl??(expr)'. The second use is to quote an identifier: a = ??c + d Now, you would think that this would quote the variable 'c' and attempt to add it to 'd', but this is not the way it works. Instead, the ?? operator invokes a kind of operator overloading that happens at the parser level, meaning that any attempt to create an expression that operates on a quoted variable also gets quoted. This only affects binary and unary operators, not functions calls, array references or assignment. So in this case, the entire expression "??c + d" gets quoted. The third way a DSL operator can be used is to quote an operator: a = c ??= d Again, this allows us to generate the AST "c = d" and assign it to the variable "a". (I recognize that these last two syntactical variations would make the interpretation of regular Python ASTs significantly more complex, which is a serious drawback. Part of the motivation for both of these variations is to deal with the 'and', 'or' and '=' problem, by overriding the Python compiler's normal treatment of these operators. However, it may be that the first form alone is sufficient, which would be relatively easy to handle in the Python compiler I think - you would simply have the AST.c module look for a DSL operator, and don't call the normal AST processing steps for arguments to that operator.) Some examples of DSLs using the operator: An SQL query as a DSL: query = SQL??(name == "John" and age > 25) result_iter = query.execute() Syntax for an algebraic solver: @arity??(x + x) def reduce(x): return 2 * x @arity??(x + 0) def reduce(x): return x @arity??(x * 0) def reduce(x): return 0 @arity??(x * 1) def reduce(x): return x @arity??(x * a + y * a) def reduce(x): return (x+y)*a # (and so on for all the other reduce() rules) print(reduce??((a+b*3) + (a+b*3)))
a*2+b*6
(In the above, we define a function 'reduce' that is similar to a generic function, except that the dispatch logic is based on the form of the AST being passed as an argument, using a one-way unification algorithm to do argument pattern matching.) I realize that there are some aspects of this proposal that may seem silly, and that's true, and the only reason I'm even suggesting this at all is that much of what I'm writing here is already been suggested or at least implied by previous discussions. Flame on! -- -- Talin
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
Talin wrote:
An SQL query as a DSL: query = SQL??(name == "John" and age > 25) result_iter = query.execute()
Can you flesh this out with the added complication that my SQL?? thingy wants to include in its expression some kind of Python function that sits at the module level, such convert_birthday_to_age? My fear is that even with an AST at my DSL library's disposal, I would still want hints from Python as to better ways to scope things in the AST given to me, and without actually changing the Python bytecode, I still have my hands tied. Or am I being too pessimistic? Or does your proposal already account for that? ____________________________________________________________________________________Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/
![](https://secure.gravatar.com/avatar/7e4e7569d64e14de784aca9f9a8fffb4.jpg?s=120&d=mm&r=g)
On Tue, May 29, 2007, Talin wrote:
This is in response to the various suggestions for embedding SQL syntax in Python and various other suggestions made over the years.
Why not try to figure out a way to build a Quixote-like system into Python? IOW, any DSL should *not* go into Python files; instead, they should go into separate files that can be processed by the Python parser and then executed. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "as long as we like the same operating system, things are cool." --piranha
![](https://secure.gravatar.com/avatar/c3eddec3d2fbdeb60b81f8fcfb6c5d23.jpg?s=120&d=mm&r=g)
Aahz wrote:
On Tue, May 29, 2007, Talin wrote:
This is in response to the various suggestions for embedding SQL syntax in Python and various other suggestions made over the years.
Why not try to figure out a way to build a Quixote-like system into Python? IOW, any DSL should *not* go into Python files; instead, they should go into separate files that can be processed by the Python parser and then executed.
That really depends on the DSL. Some DSLs want to be embedded - having to segregate SQL query statements into a separate file is a non-starter IMHO. -- Talin
participants (3)
-
Aahz
-
Steve Howell
-
Talin