Two laments about CPython's AST Nodes
Before I start complaining, I want to mention what a huge help it has been to be able to directly compare the AST exposed by ast.py in making Jython a better Python. Thanks for that! Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example: node = ast.Assign() Is valid, even though it requires "node.targets" and "node.value" (I'm less concerned about the required lineno and col_offset, as I can understand holding off on these so that you can just use fix_missing_locations to fill these in for you). My other (bigger) gripe is that you can define these values with arbitrary objects that will blow up at parse time. So for example you can write: node = ast.Assign() node.targets = "whatever" Which, when you try to parse, blows up with "TypeError: Assign field "targets" must be a list, not a str". I'd be much happier if this blew up right away when you try to make the assignment. At the moment, this is how it works in Jython (though I could support this with some contorting). BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :) -Frank
On Wed, Aug 19, 2009 at 11:10, Frank Wierzbicki<fwierzbicki@gmail.com> wrote:
Before I start complaining, I want to mention what a huge help it has been to be able to directly compare the AST exposed by ast.py in making Jython a better Python. Thanks for that!
Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example:
node = ast.Assign()
Is valid, even though it requires "node.targets" and "node.value" (I'm less concerned about the required lineno and col_offset, as I can understand holding off on these so that you can just use fix_missing_locations to fill these in for you).
My other (bigger) gripe is that you can define these values with arbitrary objects that will blow up at parse time. So for example you can write:
node = ast.Assign() node.targets = "whatever"
Which, when you try to parse, blows up with "TypeError: Assign field "targets" must be a list, not a str". I'd be much happier if this blew up right away when you try to make the assignment. At the moment, this is how it works in Jython (though I could support this with some contorting).
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
+1 from me for adding this support. While I can see people wanting to create the node as soon as it is known to be needed and then plug in the parts as they get parsed, postponing the node creation to later once the subnodes have been done is not exactly a challenge. -Brett
2009/8/19 Frank Wierzbicki <fwierzbicki@gmail.com>:
Before I start complaining, I want to mention what a huge help it has been to be able to directly compare the AST exposed by ast.py in making Jython a better Python. Thanks for that!
Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example:
node = ast.Assign()
Is valid, even though it requires "node.targets" and "node.value" (I'm less concerned about the required lineno and col_offset, as I can understand holding off on these so that you can just use fix_missing_locations to fill these in for you).
+1
My other (bigger) gripe is that you can define these values with arbitrary objects that will blow up at parse time. So for example you can write:
node = ast.Assign() node.targets = "whatever"
Which, when you try to parse, blows up with "TypeError: Assign field "targets" must be a list, not a str". I'd be much happier if this blew up right away when you try to make the assignment. At the moment, this is how it works in Jython (though I could support this with some contorting).
I also think this is a good idea, but this also causes an asymmetry. I would still be able to do this: node = ast.Module([]) node.body.append("random stuff") and not have it type checked until it is compiled. This would be hard to fix, though, and I think it is worth living with.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
Very generous. :) -- Regards, Benjamin
Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example:
node = ast.Assign()
I think we disagree in two points in our evaluation of this behavior: a) ast.Assign is *not* a node of the CPython AST. The CPython AST is a set of C structures in Include/Python-ast.h. ast.Assign is merely a mirror structure of that. b) it is, IMO, not reasonable to require users who create AST trees out of nothing to have correct trees at all times. I.e. it must be possible to represent incorrect trees. c) the AST is *not* part of the Python language or library. It may change at any time without notice, and Jython is not required to duplicate its behavior exactly. [so that's three items - as there should be in any good list of two items :-]
Which, when you try to parse, blows up with "TypeError: Assign field "targets" must be a list, not a str". I'd be much happier if this blew up right away when you try to make the assignment. At the moment, this is how it works in Jython (though I could support this with some contorting).
What precisely is it that makes this difficult to implement. If you would follow CPython's implementation strategy (i.e. generate glue code out of ASDL), I feel that it should be straight-forward to provide exactly the same behavior in Jython.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
I'm not sure I can support such a change. Giving the child nodes at creation time, optionally, would be fine with me. Requiring the tree to conform to the grammar at all times is unreasonable, IMO (you can't simultaneously create all nodes in the tree and glue them together, so you have to create the tree in steps - which means that it will be intermittently incorrect). You seem to propose some middle ground, but I'm not sure I understand what that middle ground is. Regards, Martin
Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example:
node = ast.Assign()
I think we disagree in two points in our evaluation of this behavior:
a) ast.Assign is *not* a node of the CPython AST. The CPython AST is a set of C structures in Include/Python-ast.h. ast.Assign is merely a mirror structure of that. Ah -- that is different from Jython's current design (The nodes that are constructed by ast.Assign() in Jython actually are the exact nodes
On Wed, Aug 19, 2009 at 5:00 PM, "Martin v. Löwis"<martin@v.loewis.de> wrote: that are used in real parsing)
b) it is, IMO, not reasonable to require users who create AST trees out of nothing to have correct trees at all times. I.e. it must be possible to represent incorrect trees. That does seem reasonable. Luckily it was easy to implement for me :)
c) the AST is *not* part of the Python language or library. It may change at any time without notice, and Jython is not required to duplicate its behavior exactly. Sure, I'm really just talking about ast.py (though for Jython ATM they are the same thing). So that I understand better: when this call is made in Python:
x = compile("def foo():pass", "foo", "exec", _ast.PyCF_ONLY_AST) Does x contain real AST nodes or does it contain mirror structures (feel free to just tell me: don't be lazy, go read the code). If it contains real nodes, this is where I have some implementation trouble. If a tree of real nodes is then manipulated so that you end up with a mix, I don't want to walk the entire thing over again to find the mirror objects (that might be incomplete) and replace them with real nodes. If this creates a tree of mirror nodes, then I may want to consider doing the same thing on the Jython side (it makes sense, now that I understand CPython better I realize that the cost I am incurring is probably due to having the real and mirror AST as the same beast).
[so that's three items - as there should be in any good list of two items :-] :)
What precisely is it that makes this difficult to implement. If you would follow CPython's implementation strategy (i.e. generate glue code out of ASDL), I feel that it should be straight-forward to provide exactly the same behavior in Jython. I do use the ASDL to generate this stuff, but again, the real and the mirror nodes are not separated ATM, and that is what makes it difficult.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
I'm not sure I can support such a change. Giving the child nodes at creation time, optionally, would be fine with me. Requiring the tree to conform to the grammar at all times is unreasonable, IMO (you can't simultaneously create all nodes in the tree and glue them together, so you have to create the tree in steps - which means that it will be intermittently incorrect). That is quite reasonable, I'll withdraw gripe #1 -- in fact the reason I have already implemented this in Jython is that there is already real world use out there. I still need to understand gripe #2 a little better before I back down on that one.
-Frank
On Thu, Aug 20, 2009 at 06:20, Frank Wierzbicki<fwierzbicki@gmail.com> wrote:
Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example:
node = ast.Assign()
I think we disagree in two points in our evaluation of this behavior:
a) ast.Assign is *not* a node of the CPython AST. The CPython AST is a set of C structures in Include/Python-ast.h. ast.Assign is merely a mirror structure of that. Ah -- that is different from Jython's current design (The nodes that are constructed by ast.Assign() in Jython actually are the exact nodes
On Wed, Aug 19, 2009 at 5:00 PM, "Martin v. Löwis"<martin@v.loewis.de> wrote: that are used in real parsing)
b) it is, IMO, not reasonable to require users who create AST trees out of nothing to have correct trees at all times. I.e. it must be possible to represent incorrect trees. That does seem reasonable. Luckily it was easy to implement for me :)
c) the AST is *not* part of the Python language or library. It may change at any time without notice, and Jython is not required to duplicate its behavior exactly. Sure, I'm really just talking about ast.py (though for Jython ATM they are the same thing). So that I understand better: when this call is made in Python:
x = compile("def foo():pass", "foo", "exec", _ast.PyCF_ONLY_AST)
Does x contain real AST nodes or does it contain mirror structures (feel free to just tell me: don't be lazy, go read the code). If it contains real nodes, this is where I have some implementation trouble. If a tree of real nodes is then manipulated so that you end up with a mix, I don't want to walk the entire thing over again to find the mirror objects (that might be incomplete) and replace them with real nodes. If this creates a tree of mirror nodes, then I may want to consider doing the same thing on the Jython side (it makes sense, now that I understand CPython better I realize that the cost I am incurring is probably due to having the real and mirror AST as the same beast).
I am using PEP 339 to help me with this. Looking at Python/bltinmodule.c:builtin_compile() you will notice that when you take an AST and compile it the call goes to Python/Python-ast.c:PyAST_obj2mod(). That function calls obj2ast_mod() which turns a ast.Module object into an mod_ty (defined in Python/Python-ast.c). The reverse is done with ast2obj_mod() when you get the AST out. And looking at those conversion functions it seems that the PyObject values convert as needed or reuse constants like ints and strings (see obj2ast_unaryop() to see a conversion from object AST to internal AST). But I would double-check me on all of this. =) -Brett
[so that's three items - as there should be in any good list of two items :-] :)
What precisely is it that makes this difficult to implement. If you would follow CPython's implementation strategy (i.e. generate glue code out of ASDL), I feel that it should be straight-forward to provide exactly the same behavior in Jython. I do use the ASDL to generate this stuff, but again, the real and the mirror nodes are not separated ATM, and that is what makes it difficult.
BTW -- I *am* volunteering to attempt to implement these things in CPython if there is support :)
I'm not sure I can support such a change. Giving the child nodes at creation time, optionally, would be fine with me. Requiring the tree to conform to the grammar at all times is unreasonable, IMO (you can't simultaneously create all nodes in the tree and glue them together, so you have to create the tree in steps - which means that it will be intermittently incorrect). That is quite reasonable, I'll withdraw gripe #1 -- in fact the reason I have already implemented this in Jython is that there is already real world use out there. I still need to understand gripe #2 a little better before I back down on that one.
-Frank _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
x = compile("def foo():pass", "foo", "exec", _ast.PyCF_ONLY_AST)
Does x contain real AST nodes or does it contain mirror structures (feel free to just tell me: don't be lazy, go read the code).
It only contains a mirror structure. See pythonrun.c:Py_CompileStringFlags, and the (generated) PyAST_mod2obj function. There is no way for a Python script to get hold of the real AST.
I do use the ASDL to generate this stuff, but again, the real and the mirror nodes are not separated ATM, and that is what makes it difficult.
Couldn't you just generate a check function for your tree that would be invoked before you try to process a tree that a script got access to?
That is quite reasonable, I'll withdraw gripe #1 -- in fact the reason I have already implemented this in Jython is that there is already real world use out there. I still need to understand gripe #2 a little better before I back down on that one.
If you are asking that a type check is made on assigning a value to these fields - I'm not quite sure whether you could implement that check reliably. Wouldn't it be possible to bypass it by filling a value directly into __dict__? If you can come up with a patch that checks in a reliable manner, I would be in favor of adding that (in 2.7 and 3.2), taking out the corresponding checks when converting to the internal AST. Regards, Martin
On Thu, Aug 20, 2009 at 6:11 PM, "Martin v. Löwis"<martin@v.loewis.de> wrote:
Couldn't you just generate a check function for your tree that would be invoked before you try to process a tree that a script got access to? That would be one way, though now that I understand CPython's AST design better, I am tempted to follow the lead. If I had a private AST and and a public mirror for ast.py, the design could become much simpler, and probably faster for normal parsing.
If you are asking that a type check is made on assigning a value to these fields - I'm not quite sure whether you could implement that check reliably. Wouldn't it be possible to bypass it by filling a value directly into __dict__?
If you can come up with a patch that checks in a reliable manner, I would be in favor of adding that (in 2.7 and 3.2), taking out the corresponding checks when converting to the internal AST. Great, I may give it a try, but changing the AST impl for Jython 2.6 will probably be my short term answer.
-Frank
participants (4)
-
"Martin v. Löwis"
-
Benjamin Peterson
-
Brett Cannon
-
Frank Wierzbicki