Retain string form of AST Numbers
Currently, parsing a Python script into an AST discards a huge amount of information. Some of that (line and column information) is retained as metadata, but a lot isn't. One of the potentially-useful things that's currently dropped is the string form of numbers:
print(ast.dump(ast.parse("x = 1 + 2.000 + 3.0j"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=BinOp(left=BinOp(left=Num(n=1), op=Add(), right=Num(n=2.0)), op=Add(), right=Num(n=3j)))])
The three Num nodes have an n with the already-constructed int/float/complex. At this point, it's impossible to return to the original string that was entered; any rounding will have already taken place. If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2."). It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem. Thoughts? ChrisA
On Sat, Apr 5, 2014 at 4:50 AM, Chris Angelico <rosuav@gmail.com> wrote:
If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2.").
It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem.
Thoughts?
I like it. Apart from the AST transformation possibilities opened up, I have some AST-based style checkers that would have benefited from access to the raw numeric strings. So as long as there's no significant adverse affect on total compilation time or space requirements, +1 from me. Mark
On 5 April 2014 18:09, Mark Dickinson <dickinsm@gmail.com> wrote:
On Sat, Apr 5, 2014 at 4:50 AM, Chris Angelico <rosuav@gmail.com> wrote:
If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2.").
It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem.
Thoughts?
I like it. Apart from the AST transformation possibilities opened up, I have some AST-based style checkers that would have benefited from access to the raw numeric strings.
So as long as there's no significant adverse affect on total compilation time or space requirements, +1 from me.
Also somewhat tangentially related to this issue about being able to render particular args differently in a signature: http://bugs.python.org/issue16801 I think the discussion on http://bugs.python.org/issue10769 suggests a broader philosophical discussion may be appropriate, since it makes a big difference to what is in scope for the ast module if its purpose is expanded from "easier to manipulate intermediate representation between source and bytecode" to "supports lossless source to source transformations". Articles like http://time-loop.tumblr.com/post/47664644/python-ast-preserving-whitespace-a... suggest to me that we may need to start pushing people *away* from the ast module and add a lib2to3 inspired API specifically for lossless transformations based on the *actual* grammar of the running Python version (the grammar 2to3 itself uses is a bit mixed up, because it needs to tolerate Python 2 code as input, and it currently doesn't handle "yield from" expressions - the latter isn't really an issue in practice for 2/3 transition related use cases, since hybrid 2/3 code can't use "yield from" anyway). (A Google search for "Python AST preserve comments" provides other interesting links on the topic of source->source transformations that aren't lossy the way those based on the ast module inevitably are) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Apr 5, 2014 at 8:25 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Articles like http://time-loop.tumblr.com/post/47664644/python-ast-preserving-whitespace-a... suggest to me that we may need to start pushing people *away* from the ast module and add a lib2to3 inspired API specifically for lossless transformations based on the *actual* grammar of the running Python version
Ooh that looks promising. I might try to knock up a recipe for some sort of "interactive Python in decimal mode", which would cover one of the important cases for which people ask for decimal literals. It'd not be perfect without some monkey-patch for int/int -> float, unless simply every non-complex number becomes a Decimal. Or if I don't get around to it, maybe someone else can? ChrisA
I could probably try that if you give a slightly clearer description. Chris Angelico <rosuav@gmail.com> wrote:
Articles like http://time-loop.tumblr.com/post/47664644/python-ast-preserving-whitespace-a... suggest to me that we may need to start pushing people *away* from
On Sat, Apr 5, 2014 at 8:25 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: the
ast module and add a lib2to3 inspired API specifically for lossless transformations based on the *actual* grammar of the running Python version
Ooh that looks promising. I might try to knock up a recipe for some sort of "interactive Python in decimal mode", which would cover one of the important cases for which people ask for decimal literals. It'd not be perfect without some monkey-patch for int/int -> float, unless simply every non-complex number becomes a Decimal.
Or if I don't get around to it, maybe someone else can?
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
On Sun, Apr 6, 2014 at 12:32 AM, Ryan <rymg19@gmail.com> wrote:
I could probably try that if you give a slightly clearer description.
Currently, you can fire up IDLE or interactive Python and use it as a super calculator:
1 + 2 3
And when you use it with non-integers, you get floats:
0.1 + 0.2 0.30000000000000004
We could turn this into a Decimal calculator with two important changes: 1) Every instance of a number in the code must become a Decimal (which really means wrapping every number with Decimal("...") and importing Decimal from decimal) 2) Monkey-patch Decimal.__repr__ = Decimal.__str__ to make the display tidy The latter might be better done by subclassing Decimal and putting something into the builtins, but that's implementation detail (now that Decimal is implemented in C, its attributes can't be set). As long as you don't need multiple contexts or anything complicated like that, this should in theory work. (Changing the default context shouldn't break anything AFAICT.) It'd then be usable in the same way that REXXTry is: a convenient calculator that's backed by a full programming language. It'd be better than REXXTry, in fact, as limitations on REXX's 'INTERPRET' command mean you can't define functions that way (though you can call functions defined in REXXTry itself - I had my own Enhanced REXXTry with gobs of mathematical utility functions). If that can be done without any core language changes, that would be awesome. ChrisA
On 4/5/2014 10:03 AM, Chris Angelico wrote:
On Sun, Apr 6, 2014 at 12:32 AM, Ryan <rymg19@gmail.com> wrote:
I could probably try that if you give a slightly clearer description.
Currently, you can fire up IDLE or interactive Python and use it as a super calculator:
1 + 2 3
And when you use it with non-integers, you get floats:
0.1 + 0.2 0.30000000000000004
We could turn this into a Decimal calculator with two important changes: 1) Every instance of a number in the code must become a Decimal (which really means wrapping every number with Decimal("...") and importing Decimal from decimal) 2) Monkey-patch Decimal.__repr__ = Decimal.__str__ to make the display tidy
The latter might be better done by subclassing Decimal and putting something into the builtins, but that's implementation detail (now that Decimal is implemented in C, its attributes can't be set).
If that can be done without any core language changes, that would be awesome.
One could certainly do something with Idle without changing Python. Idle has an extension mechanism that operates by modifying runtime structures. I believe it is mainly used to add menu entries for functions that operate on the edit buffer wrapped by an editor window. Thus it should be possible to 'decimalize' a file by adding an import (or subclass definition) at the top of the buffer and wrap literals with D("..."). I don't think that the existing hooks are enough to modify the behavior of the Shell. For that, one could write python code to permanently patch the Shell to add a decimal mode. It would need to send initialization code to each user subprocess and wrap literals as above, either as typed or before sending. When I have more time than I do now, I would be willing to share what I know about how either could be done. -- Terry Jan Reedy
On Sat, Apr 5, 2014 at 9:03 AM, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Apr 6, 2014 at 12:32 AM, Ryan <rymg19@gmail.com> wrote:
I could probably try that if you give a slightly clearer description.
Currently, you can fire up IDLE or interactive Python and use it as a super calculator:
1 + 2 3
And when you use it with non-integers, you get floats:
0.1 + 0.2 0.30000000000000004
We could turn this into a Decimal calculator with two important changes: 1) Every instance of a number in the code must become a Decimal (which really means wrapping every number with Decimal("...") and importing Decimal from decimal)
Simple enough.
2) Monkey-patch Decimal.__repr__ = Decimal.__str__ to make the display tidy
I actually found a module to do that: https://pypi.python.org/pypi/forbiddenfruit/0.1.1. Dangerous but awesome.
The latter might be better done by subclassing Decimal and putting something into the builtins, but that's implementation detail (now that Decimal is implemented in C, its attributes can't be set).
As long as you don't need multiple contexts or anything complicated like that, this should in theory work. (Changing the default context shouldn't break anything AFAICT.) It'd then be usable in the same way that REXXTry is: a convenient calculator that's backed by a full programming language. It'd be better than REXXTry, in fact, as limitations on REXX's 'INTERPRET' command mean you can't define functions that way (though you can call functions defined in REXXTry itself - I had my own Enhanced REXXTry with gobs of mathematical utility functions).
If that can be done without any core language changes, that would be awesome.
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated."
05.04.14 06:50, Chris Angelico написав(ла):
Currently, parsing a Python script into an AST discards a huge amount of information. Some of that (line and column information) is retained as metadata, but a lot isn't. One of the potentially-useful things that's currently dropped is the string form of numbers:
print(ast.dump(ast.parse("x = 1 + 2.000 + 3.0j"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=BinOp(left=BinOp(left=Num(n=1), op=Add(), right=Num(n=2.0)), op=Add(), right=Num(n=3j)))])
The three Num nodes have an n with the already-constructed int/float/complex. At this point, it's impossible to return to the original string that was entered; any rounding will have already taken place.
If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2.").
It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem.
Thoughts?
What about strings?
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])
Tuples?
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])
On Sun, Apr 6, 2014 at 3:39 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
What about strings?
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])
Tuples?
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])
I don't have any concrete use cases for those, but the same could be done for every type of node as a general mechanism for recreating a file more exactly. Nick linked to a tracker issue with some discussion on docstrings, which could benefit from that. Or maybe it would be better to avoid the AST altogether and do a source-level translation, as mentioned in the blog post he also linked to. ChrisA
Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST JUST THROWS AWAY TOO MUCH. On Saturday, April 5, 2014, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Apr 6, 2014 at 3:39 AM, Serhiy Storchaka <storchaka@gmail.com<javascript:;>> wrote:
What about strings?
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])
Tuples?
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])
I don't have any concrete use cases for those, but the same could be done for every type of node as a general mechanism for recreating a file more exactly. Nick linked to a tracker issue with some discussion on docstrings, which could benefit from that. Or maybe it would be better to avoid the AST altogether and do a source-level translation, as mentioned in the blog post he also linked to.
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (on iPad)
Whoops, caps lock error. Didn't mean to shout. :-) On Saturday, April 5, 2014, Guido van Rossum <guido@python.org> wrote:
Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST JUST THROWS AWAY TOO MUCH.
On Saturday, April 5, 2014, Chris Angelico <rosuav@gmail.com<javascript:_e(%7B%7D,'cvml','rosuav@gmail.com');>> wrote:
On Sun, Apr 6, 2014 at 3:39 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
What about strings?
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])
Tuples?
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])
I don't have any concrete use cases for those, but the same could be done for every type of node as a general mechanism for recreating a file more exactly. Nick linked to a tracker issue with some discussion on docstrings, which could benefit from that. Or maybe it would be better to avoid the AST altogether and do a source-level translation, as mentioned in the blog post he also linked to.
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (on iPad)
-- --Guido van Rossum (on iPad)
On Sun, Apr 6, 2014 at 8:09 AM, Guido van Rossum <guido@python.org> wrote:
Whoops, caps lock error. Didn't mean to shout. :-)
Heh, glad I hadn't offended you there :) I've put "Learn the 2to3 parser and use it for source code transformation" (with Nick's link) in my TODO. No telling when it'll actually be deployed, though. ChrisA
On 04/05/2014 03:08 PM, Guido van Rossum wrote:
Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST just throws away too much.
Are you saying it throws away too much for the question at hand, or that it throws away too much and should be modified to not throw away too much? -- ~Ethan~
The former. It's porpose is to generate code, not to reproduce the source. On Sunday, April 6, 2014, Ethan Furman <ethan@stoneleaf.us> wrote:
On 04/05/2014 03:08 PM, Guido van Rossum wrote:
Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST just throws away too much.
Are you saying it throws away too much for the question at hand, or that it throws away too much and should be modified to not throw away too much?
-- ~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (on iPad)
participants (9)
-
Chris Angelico
-
Ethan Furman
-
Guido van Rossum
-
Mark Dickinson
-
Nick Coghlan
-
Ryan
-
Ryan Gonzalez
-
Serhiy Storchaka
-
Terry Reedy