Two laments about CPython's AST Nodes

Before I start complaining, I want to mention what a huge help it has been to be able to directly compare the AST exposed by ast.py in making Jython a better Python. Thanks for that! Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example: node = ast.Assign() Is valid, even though it requires "node.targets" and "node.value" (I'm less concerned about the required lineno and col_offset, as I can understand holding off on these so that you can just use fix_missing_locations to fill these in for you). My other (bigger) gripe is that you can define these values with arbitrary objects that will blow up at parse time. So for example you can write: node = ast.Assign() node.targets = "whatever" Which, when you try to parse, blows up with "TypeError: Assign field "targets" must be a list, not a str". I'd be much happier if this blew up right away when you try to make the assignment. At the moment, this is how it works in Jython (though I could support this with some contorting). BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :) -Frank

On Wed, Aug 19, 2009 at 11:10, Frank Wierzbicki<fwierzbicki@gmail.com> wrote:
+1 from me for adding this support. While I can see people wanting to create the node as soon as it is known to be needed and then plug in the parts as they get parsed, postponing the node creation to later once the subnodes have been done is not exactly a challenge. -Brett

2009/8/19 Frank Wierzbicki <fwierzbicki@gmail.com>:
+1
I also think this is a good idea, but this also causes an asymmetry. I would still be able to do this: node = ast.Module([]) node.body.append("random stuff") and not have it type checked until it is compiled. This would be hard to fix, though, and I think it is worth living with.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
Very generous. :) -- Regards, Benjamin

I think we disagree in two points in our evaluation of this behavior: a) ast.Assign is *not* a node of the CPython AST. The CPython AST is a set of C structures in Include/Python-ast.h. ast.Assign is merely a mirror structure of that. b) it is, IMO, not reasonable to require users who create AST trees out of nothing to have correct trees at all times. I.e. it must be possible to represent incorrect trees. c) the AST is *not* part of the Python language or library. It may change at any time without notice, and Jython is not required to duplicate its behavior exactly. [so that's three items - as there should be in any good list of two items :-]
What precisely is it that makes this difficult to implement. If you would follow CPython's implementation strategy (i.e. generate glue code out of ASDL), I feel that it should be straight-forward to provide exactly the same behavior in Jython.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
I'm not sure I can support such a change. Giving the child nodes at creation time, optionally, would be fine with me. Requiring the tree to conform to the grammar at all times is unreasonable, IMO (you can't simultaneously create all nodes in the tree and glue them together, so you have to create the tree in steps - which means that it will be intermittently incorrect). You seem to propose some middle ground, but I'm not sure I understand what that middle ground is. Regards, Martin

On Wed, Aug 19, 2009 at 5:00 PM, "Martin v. Löwis"<martin@v.loewis.de> wrote: that are used in real parsing)
x = compile("def foo():pass", "foo", "exec", _ast.PyCF_ONLY_AST) Does x contain real AST nodes or does it contain mirror structures (feel free to just tell me: don't be lazy, go read the code). If it contains real nodes, this is where I have some implementation trouble. If a tree of real nodes is then manipulated so that you end up with a mix, I don't want to walk the entire thing over again to find the mirror objects (that might be incomplete) and replace them with real nodes. If this creates a tree of mirror nodes, then I may want to consider doing the same thing on the Jython side (it makes sense, now that I understand CPython better I realize that the cost I am incurring is probably due to having the real and mirror AST as the same beast).
-Frank

On Thu, Aug 20, 2009 at 06:20, Frank Wierzbicki<fwierzbicki@gmail.com> wrote:
I am using PEP 339 to help me with this. Looking at Python/bltinmodule.c:builtin_compile() you will notice that when you take an AST and compile it the call goes to Python/Python-ast.c:PyAST_obj2mod(). That function calls obj2ast_mod() which turns a ast.Module object into an mod_ty (defined in Python/Python-ast.c). The reverse is done with ast2obj_mod() when you get the AST out. And looking at those conversion functions it seems that the PyObject values convert as needed or reuse constants like ints and strings (see obj2ast_unaryop() to see a conversion from object AST to internal AST). But I would double-check me on all of this. =) -Brett

It only contains a mirror structure. See pythonrun.c:Py_CompileStringFlags, and the (generated) PyAST_mod2obj function. There is no way for a Python script to get hold of the real AST.
Couldn't you just generate a check function for your tree that would be invoked before you try to process a tree that a script got access to?
If you are asking that a type check is made on assigning a value to these fields - I'm not quite sure whether you could implement that check reliably. Wouldn't it be possible to bypass it by filling a value directly into __dict__? If you can come up with a patch that checks in a reliable manner, I would be in favor of adding that (in 2.7 and 3.2), taking out the corresponding checks when converting to the internal AST. Regards, Martin

On Wed, Aug 19, 2009 at 11:10, Frank Wierzbicki<fwierzbicki@gmail.com> wrote:
+1 from me for adding this support. While I can see people wanting to create the node as soon as it is known to be needed and then plug in the parts as they get parsed, postponing the node creation to later once the subnodes have been done is not exactly a challenge. -Brett

2009/8/19 Frank Wierzbicki <fwierzbicki@gmail.com>:
+1
I also think this is a good idea, but this also causes an asymmetry. I would still be able to do this: node = ast.Module([]) node.body.append("random stuff") and not have it type checked until it is compiled. This would be hard to fix, though, and I think it is worth living with.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
Very generous. :) -- Regards, Benjamin

I think we disagree in two points in our evaluation of this behavior: a) ast.Assign is *not* a node of the CPython AST. The CPython AST is a set of C structures in Include/Python-ast.h. ast.Assign is merely a mirror structure of that. b) it is, IMO, not reasonable to require users who create AST trees out of nothing to have correct trees at all times. I.e. it must be possible to represent incorrect trees. c) the AST is *not* part of the Python language or library. It may change at any time without notice, and Jython is not required to duplicate its behavior exactly. [so that's three items - as there should be in any good list of two items :-]
What precisely is it that makes this difficult to implement. If you would follow CPython's implementation strategy (i.e. generate glue code out of ASDL), I feel that it should be straight-forward to provide exactly the same behavior in Jython.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
I'm not sure I can support such a change. Giving the child nodes at creation time, optionally, would be fine with me. Requiring the tree to conform to the grammar at all times is unreasonable, IMO (you can't simultaneously create all nodes in the tree and glue them together, so you have to create the tree in steps - which means that it will be intermittently incorrect). You seem to propose some middle ground, but I'm not sure I understand what that middle ground is. Regards, Martin

On Wed, Aug 19, 2009 at 5:00 PM, "Martin v. Löwis"<martin@v.loewis.de> wrote: that are used in real parsing)
x = compile("def foo():pass", "foo", "exec", _ast.PyCF_ONLY_AST) Does x contain real AST nodes or does it contain mirror structures (feel free to just tell me: don't be lazy, go read the code). If it contains real nodes, this is where I have some implementation trouble. If a tree of real nodes is then manipulated so that you end up with a mix, I don't want to walk the entire thing over again to find the mirror objects (that might be incomplete) and replace them with real nodes. If this creates a tree of mirror nodes, then I may want to consider doing the same thing on the Jython side (it makes sense, now that I understand CPython better I realize that the cost I am incurring is probably due to having the real and mirror AST as the same beast).
-Frank

On Thu, Aug 20, 2009 at 06:20, Frank Wierzbicki<fwierzbicki@gmail.com> wrote:
I am using PEP 339 to help me with this. Looking at Python/bltinmodule.c:builtin_compile() you will notice that when you take an AST and compile it the call goes to Python/Python-ast.c:PyAST_obj2mod(). That function calls obj2ast_mod() which turns a ast.Module object into an mod_ty (defined in Python/Python-ast.c). The reverse is done with ast2obj_mod() when you get the AST out. And looking at those conversion functions it seems that the PyObject values convert as needed or reuse constants like ints and strings (see obj2ast_unaryop() to see a conversion from object AST to internal AST). But I would double-check me on all of this. =) -Brett

It only contains a mirror structure. See pythonrun.c:Py_CompileStringFlags, and the (generated) PyAST_mod2obj function. There is no way for a Python script to get hold of the real AST.
Couldn't you just generate a check function for your tree that would be invoked before you try to process a tree that a script got access to?
If you are asking that a type check is made on assigning a value to these fields - I'm not quite sure whether you could implement that check reliably. Wouldn't it be possible to bypass it by filling a value directly into __dict__? If you can come up with a patch that checks in a reliable manner, I would be in favor of adding that (in 2.7 and 3.2), taking out the corresponding checks when converting to the internal AST. Regards, Martin
participants (4)
-
"Martin v. Löwis"
-
Benjamin Peterson
-
Brett Cannon
-
Frank Wierzbicki