On Mar 28, 2015, at 18:51, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Mar 28, 2015 at 12:50:09PM -0700, Andrew Barnert wrote:
On Mar 28, 2015, at 10:26, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Mar 28, 2015 at 09:53:48AM -0700, Matthew Rocklin wrote: [...] The goal is to create things that look like functions but have access to the expression that was passed in.
assertRaises(ZeroDivisionError, 1/0) # Evaluate the rhs 1/0 within assertRaises function, not before
Generally one constructs something that looks like a function but, rather than receiving a pre-evaluated input, receives a syntax tree along with the associated context. This allows that function-like-thing to manipulate the expression and to control the context in which the evaluation occurs.
How will the Python compiler determine that assertRaises should receive the syntax tree rather than the evaluated 1/0 expression (which of course will raise)? The information that assertRaises is a "macro" is not available at compile time.
Well, it _could_ be available. At the time you're compiling a scope (a function, class, or top-level module code), if it uses an identifier that's the name of a macro in scope, the compiler expands the macro instead of compiling in a function call.
Perhaps I trimmed out too much of Matthew's comment, but he did say he isn't talking about C-style preprocessor macros, so I think that if you are imagining "expanding the macro" C-style, you're barking up the wrong tree.
No, I'm not imagining C style expansion. What I'm imagining is closer to Lisp-style, and closer still to Dylan-style.* I didn't want to get into details because they're complicated, and there even may be multiple complicated ways to do things that have to be chosen from, and none of that is relevant to your question. But if you're curious, let me give a more specific explanation: A macro is compiled by transforming its AST into AST-transforming bytecode, similar to a function, but then instead of embedding a BUILD_FUNCTION opcode into the defining scope's bytecode, you do that, and _also_ effectively call it in the current scope and bind the macro to the name there.** A macro is expanded by parsing the arguments into ASTs, calling the AST-transforming function on those ASTs, and substituting the result into the tree at the point of the macro call.*** You don't need to explicitly "pass" a context; the context in which the function is called (the compile-time scope, etc.) is implicitly available, just as for runtime functions. As I mentioned before, there are a number of additional issues you'd have to resolve (again, think import, and .pyc files, for an example), some of which may make the feature undesirable once you think them through, but I don't think any of them are relevant to your question, and I think something like this design is what he was asking about. * In Lisp, because there is no syntax and hence no separate step to turn a parenthesized token stream into an AST, it's ambiguous at which stage--before or after that non-existent step--macros are applied. In languages with syntax and grammars, the usual answer is to do it after parsing to AST. You can conceptually define macros at any stage in the pipeline, but that's where they turn out to be most useful. Also, I'm ignoring the issue of hygiene, but I think most people want macros to be hygienic by default and unhygienic only on explicit demand, rather than what Lisp does. ** The details here are tricky because the compiler's notion of "current scope" isn't defined by the language anyway and doesn't correspond to anything defined at runtime, but the intuitive idea is clear: while compiling a module or other scope, if you reach a def or class, a compiler has to do the equivalent of recursively calling itself or pushing a scope onto a stack manually; that stack defines the compile-time scope. So the name bound to the macro goes away when you exit that recursive call/pop that scope from the stack. The compiler currently keeps track of all variables assigned to in the current scope to determine local variables and closures; I believe macro assignments can be piled on top of that. If not, this is something completely new that you have to bolt on. *** In most languages with syntax and macros, a macro can only take an expression, and must return an expression--which is fine for most of those languages, where almost everything (flow control, variable binding, etc.) is an expression, but not so much in Python, where many of those things can only be done in a statement. But allowing a statement or an expression (or any arbitrary AST node) and allowing a macro to likewise "return" either (or any) type isn't straightforward once you think through some examples.
From his description, I don't think Lisp macros are quite the right description either. (As always, I welcome correction if I'm the one who is mistaken.)
C macros are more or less equivalent to a source-code rewriter: if you define a macro for "foo", whenever the compiler sees a token "foo", it replaces it with the body of the macro. More or less.
Lisp macros are different, and more powerful:
http://c2.com/cgi/wiki?LispMacro http://cl-cookbook.sourceforge.net/macros.html
but I don't think that's what Matthew wants either. The critical phrase is, I think:
"one constructs something that looks like a function but, rather than receiving a pre-evaluated input, receives a syntax tree along with the associated context"
which I interpret in this way:
Suppose we have this chunk of code creating then using a macro:
macro mymacro(expr): # process expr somehow
mymacro(x + 1)
That is *roughly* equivalent (ignoring the part about context) to what we can do today:
def myfunction(expr): assert isinstance(expr, ast.Expression) # process expr somehow
tree = ast.parse('x + 1', mode='eval') myfunction(tree)
The critical difference being that instead of the author writing code to manually generate the syntax tree from a string at runtime, the compiler automatically generates the tree from the source code at compile time.
This is why I think that it can't be done by Python. What should the compiler do here?
callables = [myfunction, mymacro] random.shuffle(callables) for f in callables: f(x + 1)
Well, that depends on how much of Python you want to be available at compile time. One possibility is that only the defmacro statement is executed at compile time. You may want to add imports to that. You may want to add assignments. You may want to add some large, ridiculously-complex, but well-defined subset of Python (I would ask you to think C++ constexpr rules, but that request might be construed as torture.) Or you may even want the entire language. And any of the above could be modified by making some or all of those constructs compile-time only when explicitly required (again, like constexpr). Any of these is conceptually sensible (although none of them may be desirable...), and they give you different answers here. For example, if you only execute defmacro and import at compile time, then at compile time f is not expanded as a macro, it's just called as a function, which will probably raise a TypeError at runtime (because the current value of x + 1 is probably not an AST node...).
If that strikes you as too artificial, how about a simpler case?
from mymodule import name name(x + 1)
If `name` will refer to a function at runtime, the compiler needs to generate code which evaluates x+1 and passes the result to `name`; but if `name` will refer to a macro, the compiler needs to generate an ast (plus context) and pass it without evaluating it. To do that, it needs to know at compile-time which objects are functions and which are macros, but that is not available until runtime.
This is exactly what I was referring to when I said that you need to make significant changes to other parts of the language, such as revising the import machinery and .pyc files, and that this problem does not apply to future statements. I don't want to go over all the details in as much depth as the last question, so hopefully you'll just accept that the answer is the same: it's not available today, but it could be available, in a variety of different ways that you'd have to choose between, all of which would have different knock-on effects.
But we might be able to rescue this proposal by dropping the requirement that the compiler knows when to pass the syntax tree and when to evaluate it. Suppose instead we had a lightweight syntax for generating the AST plus grabbing the current context:
x = 23 spam(x + 1, !(x+1)) # macro syntax !( ... )
That's a lot closer to what MacroPy does. And notice that if the !() syntax were added to the grammar, MacroPy or something like it could be significantly simpler.* Which means it might be easier to integrate it directly into the builtin compiler--but also means it might be less desirable to do so, as leaving it as an externally-supplied import hook has all the benefits of externally-supplied modules in general. (However, there is still the disadvantage that you have to apply an import hook before importing, effectively meaning you can't use or define macros in your top-level script. If necessary, you could fix that as well with special syntax that must come before anything but comments and future statements that adds an import hook. This is the kind of thing I was talking about in my first message, about finding smaller changes to the language that make MacroPy or something like it simpler and/or more flexible.) * I believe OCaml recently added something similar for similar purposes, but I haven't used it in a few years, so I may be misinterpreting what I saw skimming the what's new for the last major version.
Now the programmer is responsible for deciding when to use an AST and when to evaluate it, not the compiler, and "macros" become regular functions which just happen to expect an AST as their argument.
No, not quite. What do you _do_ with the AST returned by the macro? And when do you do it? You still have to substitute it into the tree of the current compilation target in place of the macro call, which means it still has to be available at compile time. It does provide a simpler way to resolve some of the other issues, but it doesn't resolve the most fundamental one.
[...]
If such a light-lambda syntax reduced the desire for macros down to the point where it could be ignored, and if that desire weren't _already_ low enough that it can be ignored, it would be worth adding. I think the second "if" is where it fails, not the first, but I could be wrong.
I presume that Matthew wants the opportunity to post-process the AST, not merely evaluate it. If all you want is to wrap some code an an environment in a bundle for later evaluation, you are right, a function will do the job. But it's hard to manipulate byte code, hence the desire for a syntax tree.
Sure, but most of the examples people give for wanting macros--including his example that you quoted--don't actually do anything that can't be done with a higher-order function. Which means they may not actually want macros, they just think they do. Ask a Haskell lover why Haskell doesn't need macros, and he'll tell you that it's because you don't need them, you only think you do because of your Lisp prejudices. Of course that isn't 100% true,* but it's true enough that most people are happy without macros in Haskell. Similarly, while it would be even farther from 100% true in Python,** it might still be true enough that most people are happy without macros in Python. (Except, as I said, most people are _already_ happy without macros in Python, which means we may have an even simpler option: just do nothing.) * For one reasonably well-known example (although I may be misremembering, so take this as a "this kind of thing" rather than "exactly this..."), if Haskell98 had macros, you could use them to simulate GADTs, which didn't exist until a later version of the language. For an equivalent example in Python: you could use macros to simulate with statements in Python 2.5. As long as non-silly cases for macros are rare enough, people are satisfied with evaluating them at language design time (a discussion on the list followed by an update to the language and a patch to GHCI/CPython) instead of compile time. :) ** The main reason it would be less true in Python is eager evaluation; a lazy language like Haskell (or, even better, a dataflow language) can replace even more uses of macros with HOFs than an eager language. For example, in Haskell, the equivalent of "def foo(x, y): return y if x else 0" doesn't need the value of y unless x is true, so it doesn't matter that it y is a value rather than an expression. But OCaml, for example, also doesn't have lazy evaluation, and people seem to have the same attitude toward macros there too. (Although it does have a powerful preprocessor, it's not that much different from what Python has with import hooks.) Well, despite trying to skim over some parts, I still wrote a whole book here; apologies for that, to anyone who's still reading. :)