Mailman 3 Retain string form of AST Numbers - Python-ideas

Retain string form of AST Numbers

Chris Angelico

April 5, 2014

3:50 a.m.

Currently, parsing a Python script into an AST discards a huge amount of information. Some of that (line and column information) is retained as metadata, but a lot isn't. One of the potentially-useful things that's currently dropped is the string form of numbers:

...

...
...
print(ast.dump(ast.parse("x = 1 + 2.000 + 3.0j"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=BinOp(left=BinOp(left=Num(n=1), op=Add(), right=Num(n=2.0)), op=Add(), right=Num(n=3j)))])

The three Num nodes have an n with the already-constructed int/float/complex. At this point, it's impossible to return to the original string that was entered; any rounding will have already taken place. If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2."). It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem. Thoughts? ChrisA

Show replies by date

Mark Dickinson

April 2014

8:09 a.m.

On Sat, Apr 5, 2014 at 4:50 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2.").

It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem.

Thoughts?

I like it. Apart from the AST transformation possibilities opened up, I have some AST-based style checkers that would have benefited from access to the raw numeric strings. So as long as there's no significant adverse affect on total compilation time or space requirements, +1 from me. Mark

Mark Dickinson

8:10 a.m.

On Sat, Apr 5, 2014 at 9:09 AM, Mark Dickinson <dickinsm@gmail.com> wrote:

...

So as long as there's no significant adverse affect on total compilation time or space requirements, +1 from me.

I can't quite believe I wrote that. adverse *effect*! -- Mark

Nick Coghlan

9:25 a.m.

On 5 April 2014 18:09, Mark Dickinson <dickinsm@gmail.com> wrote:

...

On Sat, Apr 5, 2014 at 4:50 AM, Chris Angelico <rosuav@gmail.com> wrote:

...
If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2.").

It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem.

Thoughts?

I like it. Apart from the AST transformation possibilities opened up, I have some AST-based style checkers that would have benefited from access to the raw numeric strings.

So as long as there's no significant adverse affect on total compilation time or space requirements, +1 from me.

Also somewhat tangentially related to this issue about being able to render particular args differently in a signature: http://bugs.python.org/issue16801 I think the discussion on http://bugs.python.org/issue10769 suggests a broader philosophical discussion may be appropriate, since it makes a big difference to what is in scope for the ast module if its purpose is expanded from "easier to manipulate intermediate representation between source and bytecode" to "supports lossless source to source transformations". Articles like http://time-loop.tumblr.com/post/47664644/python-ast-preserving-whitespace-a... suggest to me that we may need to start pushing people *away* from the ast module and add a lib2to3 inspired API specifically for lossless transformations based on the *actual* grammar of the running Python version (the grammar 2to3 itself uses is a bit mixed up, because it needs to tolerate Python 2 code as input, and it currently doesn't handle "yield from" expressions - the latter isn't really an issue in practice for 2/3 transition related use cases, since hybrid 2/3 code can't use "yield from" anyway). (A Google search for "Python AST preserve comments" provides other interesting links on the topic of source->source transformations that aren't lossy the way those based on the ast module inevitably are) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Angelico

11:53 a.m.

On Sat, Apr 5, 2014 at 8:25 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Articles like http://time-loop.tumblr.com/post/47664644/python-ast-preserving-whitespace-a... suggest to me that we may need to start pushing people *away* from the ast module and add a lib2to3 inspired API specifically for lossless transformations based on the *actual* grammar of the running Python version

Ooh that looks promising. I might try to knock up a recipe for some sort of "interactive Python in decimal mode", which would cover one of the important cases for which people ask for decimal literals. It'd not be perfect without some monkey-patch for int/int -> float, unless simply every non-complex number becomes a Decimal. Or if I don't get around to it, maybe someone else can? ChrisA

Ryan

1:32 p.m.

I could probably try that if you give a slightly clearer description. Chris Angelico <rosuav@gmail.com> wrote:

...

...
Articles like http://time-loop.tumblr.com/post/47664644/python-ast-preserving-whitespace-a... suggest to me that we may need to start pushing people *away* from

On Sat, Apr 5, 2014 at 8:25 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: the

...
ast module and add a lib2to3 inspired API specifically for lossless transformations based on the *actual* grammar of the running Python version

Ooh that looks promising. I might try to knock up a recipe for some sort of "interactive Python in decimal mode", which would cover one of the important cases for which people ask for decimal literals. It'd not be perfect without some monkey-patch for int/int -> float, unless simply every non-complex number becomes a Decimal.

Or if I don't get around to it, maybe someone else can?

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Chris Angelico

2:03 p.m.

On Sun, Apr 6, 2014 at 12:32 AM, Ryan <rymg19@gmail.com> wrote:

...

I could probably try that if you give a slightly clearer description.

Currently, you can fire up IDLE or interactive Python and use it as a super calculator:

...

...
...
1 + 2 3

And when you use it with non-integers, you get floats:

...

...
...
0.1 + 0.2 0.30000000000000004

We could turn this into a Decimal calculator with two important changes: 1) Every instance of a number in the code must become a Decimal (which really means wrapping every number with Decimal("...") and importing Decimal from decimal) 2) Monkey-patch Decimal.__repr__ = Decimal.__str__ to make the display tidy The latter might be better done by subclassing Decimal and putting something into the builtins, but that's implementation detail (now that Decimal is implemented in C, its attributes can't be set). As long as you don't need multiple contexts or anything complicated like that, this should in theory work. (Changing the default context shouldn't break anything AFAICT.) It'd then be usable in the same way that REXXTry is: a convenient calculator that's backed by a full programming language. It'd be better than REXXTry, in fact, as limitations on REXX's 'INTERPRET' command mean you can't define functions that way (though you can call functions defined in REXXTry itself - I had my own Enhanced REXXTry with gobs of mathematical utility functions). If that can be done without any core language changes, that would be awesome. ChrisA

Terry Reedy

8:31 p.m.

On 4/5/2014 10:03 AM, Chris Angelico wrote:

...

On Sun, Apr 6, 2014 at 12:32 AM, Ryan <rymg19@gmail.com> wrote:

...
I could probably try that if you give a slightly clearer description.

Currently, you can fire up IDLE or interactive Python and use it as a super calculator:

...
...
...
1 + 2 3

And when you use it with non-integers, you get floats:

...
...
...
0.1 + 0.2 0.30000000000000004

We could turn this into a Decimal calculator with two important changes: 1) Every instance of a number in the code must become a Decimal (which really means wrapping every number with Decimal("...") and importing Decimal from decimal) 2) Monkey-patch Decimal.__repr__ = Decimal.__str__ to make the display tidy

The latter might be better done by subclassing Decimal and putting something into the builtins, but that's implementation detail (now that Decimal is implemented in C, its attributes can't be set).

If that can be done without any core language changes, that would be awesome.

One could certainly do something with Idle without changing Python. Idle has an extension mechanism that operates by modifying runtime structures. I believe it is mainly used to add menu entries for functions that operate on the edit buffer wrapped by an editor window. Thus it should be possible to 'decimalize' a file by adding an import (or subclass definition) at the top of the buffer and wrap literals with D("..."). I don't think that the existing hooks are enough to modify the behavior of the Shell. For that, one could write python code to permanently patch the Shell to add a decimal mode. It would need to send initialization code to each user subprocess and wrap literals as above, either as typed or before sending. When I have more time than I do now, I would be willing to share what I know about how either could be done. -- Terry Jan Reedy

Ryan Gonzalez

8:42 p.m.

On Sat, Apr 5, 2014 at 9:03 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

On Sun, Apr 6, 2014 at 12:32 AM, Ryan <rymg19@gmail.com> wrote:

...
I could probably try that if you give a slightly clearer description.

Currently, you can fire up IDLE or interactive Python and use it as a super calculator:

...
...
...
1 + 2 3

And when you use it with non-integers, you get floats:

...
...
...
0.1 + 0.2 0.30000000000000004

We could turn this into a Decimal calculator with two important changes: 1) Every instance of a number in the code must become a Decimal (which really means wrapping every number with Decimal("...") and importing Decimal from decimal)

Simple enough.

...

2) Monkey-patch Decimal.__repr__ = Decimal.__str__ to make the display tidy

I actually found a module to do that: https://pypi.python.org/pypi/forbiddenfruit/0.1.1. Dangerous but awesome.

...

The latter might be better done by subclassing Decimal and putting something into the builtins, but that's implementation detail (now that Decimal is implemented in C, its attributes can't be set).

As long as you don't need multiple contexts or anything complicated like that, this should in theory work. (Changing the default context shouldn't break anything AFAICT.) It'd then be usable in the same way that REXXTry is: a convenient calculator that's backed by a full programming language. It'd be better than REXXTry, in fact, as limitations on REXX's 'INTERPRET' command mean you can't define functions that way (though you can call functions defined in REXXTry itself - I had my own Enhanced REXXTry with gobs of mathematical utility functions).

If that can be done without any core language changes, that would be awesome.

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated."

Serhiy Storchaka

5:39 p.m.

05.04.14 06:50, Chris Angelico написав(ла):

...

Currently, parsing a Python script into an AST discards a huge amount of information. Some of that (line and column information) is retained as metadata, but a lot isn't. One of the potentially-useful things that's currently dropped is the string form of numbers:

...
...
...
print(ast.dump(ast.parse("x = 1 + 2.000 + 3.0j"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=BinOp(left=BinOp(left=Num(n=1), op=Add(), right=Num(n=2.0)), op=Add(), right=Num(n=3j)))])

The three Num nodes have an n with the already-constructed int/float/complex. At this point, it's impossible to return to the original string that was entered; any rounding will have already taken place.

If the Num nodes also had a .str member with the unparsed string ("1", "2.000", and "3.0j"), it would be possible to influence their interpretation with an AST transform - for instance, turning them into decimal.Decimal construction calls, or wrapping them in something that automatically keeps track of significant figures (which would mean that "2.000" is considered more precise than a mere "2.0" or "2.").

It shouldn't be too difficult to do, and it could simply be ignored in most cases; any sort of change to the actual number could leave the original string out of sync with it, which isn't a problem.

Thoughts?

What about strings?

...

...
...
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])

Tuples?

...

...
...
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])

Chris Angelico

9:55 p.m.

On Sun, Apr 6, 2014 at 3:39 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

What about strings?

...
...
...
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])

Tuples?

...
...
...
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])

I don't have any concrete use cases for those, but the same could be done for every type of node as a general mechanism for recreating a file more exactly. Nick linked to a tracker issue with some discussion on docstrings, which could benefit from that. Or maybe it would be better to avoid the AST altogether and do a source-level translation, as mentioned in the blog post he also linked to. ChrisA

Guido van Rossum

10:08 p.m.

Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST JUST THROWS AWAY TOO MUCH. On Saturday, April 5, 2014, Chris Angelico <rosuav@gmail.com> wrote:

...

On Sun, Apr 6, 2014 at 3:39 AM, Serhiy Storchaka <storchaka@gmail.com<javascript:;>> wrote:

...
What about strings?

...
...
...
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])

Tuples?

...
...
...
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])

I don't have any concrete use cases for those, but the same could be done for every type of node as a general mechanism for recreating a file more exactly. Nick linked to a tracker issue with some discussion on docstrings, which could benefit from that. Or maybe it would be better to avoid the AST altogether and do a source-level translation, as mentioned in the blog post he also linked to.

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (on iPad)

Guido van Rossum

10:09 p.m.

Whoops, caps lock error. Didn't mean to shout. :-) On Saturday, April 5, 2014, Guido van Rossum <guido@python.org> wrote:

...

Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST JUST THROWS AWAY TOO MUCH.

On Saturday, April 5, 2014, Chris Angelico <rosuav@gmail.com<javascript:_e(%7B%7D,'cvml','rosuav@gmail.com');>> wrote:

...
On Sun, Apr 6, 2014 at 3:39 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:

...
What about strings?

...
...
...
print(ast.dump(ast.parse(r"x = '''\x0a\u000d''' r'x'"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Str(s='\n\rx'))])

Tuples?

...
...
...
print(ast.dump(ast.parse("x = (1, 2,)"))) Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Tuple(elts=[Num(n=1), Num(n=2)], ctx=Load()))])

I don't have any concrete use cases for those, but the same could be done for every type of node as a general mechanism for recreating a file more exactly. Nick linked to a tracker issue with some discussion on docstrings, which could benefit from that. Or maybe it would be better to avoid the AST altogether and do a source-level translation, as mentioned in the blog post he also linked to.

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (on iPad)

-- --Guido van Rossum (on iPad)

Chris Angelico

10:11 p.m.

On Sun, Apr 6, 2014 at 8:09 AM, Guido van Rossum <guido@python.org> wrote:

...

Whoops, caps lock error. Didn't mean to shout. :-)

Heh, glad I hadn't offended you there :) I've put "Learn the 2to3 parser and use it for source code transformation" (with Nick's link) in my TODO. No telling when it'll actually be deployed, though. ChrisA

Ethan Furman

4:20 p.m.

On 04/05/2014 03:08 PM, Guido van Rossum wrote:

...

Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST just throws away too much.

Are you saying it throws away too much for the question at hand, or that it throws away too much and should be modified to not throw away too much? -- ~Ethan~

Guido van Rossum

4:49 p.m.

The former. It's porpose is to generate code, not to reproduce the source. On Sunday, April 6, 2014, Ethan Furman <ethan@stoneleaf.us> wrote:

...

On 04/05/2014 03:08 PM, Guido van Rossum wrote:

...
Lib2to3 has its own parser and tree so it can reconstruct the file, with comments. AST just throws away too much.

Are you saying it throws away too much for the question at hand, or that it throws away too much and should be modified to not throw away too much?

-- ~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (on iPad)

3956

Age (days ago)

3957

Last active (days ago)

List overview

Download

15 comments

9 participants

participants (9)

Chris Angelico
Ethan Furman
Guido van Rossum
Mark Dickinson
Nick Coghlan
Ryan
Ryan Gonzalez
Serhiy Storchaka
Terry Reedy

Retain string form of AST Numbers

tags

participants (9)