Hi Python-Dev,
I'm trying to get my head around on what's accepted in f-strings -- https://www.python.org/dev/peps/pep-0498/ seems very light on the details on what it does accept as an expression and how things should actually be parsed (and the current implementation still doesn't seem to be in a state for a final release, so, I thought asking on python-dev would be a reasonable option).
I was thinking there'd be some grammar for it (something as https://docs.python.org/3.6/reference/grammar.html), but all I could find related to this is a quote saying that f-strings should be something as:
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text>
So, given that, is it safe to assume that <expression> would be equal to the "test" node from the official grammar?
I initially thought it would obviously be, but the PEP says that using a lamda inside the expression would conflict because of the colon (which wouldn't happen if a proper grammar was actually used for this parsing as there'd be no conflict as the lamda would properly consume the colon), so, I guess some pre-parser steps takes place to separate the expression to then be parsed, so, I'm interested on knowing how exactly that should work when the implementation is finished -- lots of plus points if there's actually a grammar to back it up :)
Thanks,
Fabio
On 11/3/2016 3:06 PM, Fabio Zadrozny wrote:
Hi Python-Dev,
I'm trying to get my head around on what's accepted in f-strings -- https://www.python.org/dev/peps/pep-0498/ seems very light on the details on what it does accept as an expression and how things should actually be parsed (and the current implementation still doesn't seem to be in a state for a final release, so, I thought asking on python-dev would be a reasonable option).
In what way do you think the implementation isn't ready for a final release?
I was thinking there'd be some grammar for it (something as https://docs.python.org/3.6/reference/grammar.html), but all I could find related to this is a quote saying that f-strings should be something as:
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text>
So, given that, is it safe to assume that <expression> would be equal to the "test" node from the official grammar?
No. There are really three phases here:
1. The f-string is tokenized as a regular STRING token, like all other strings (f-, b-, u-, r-, etc). 2. The parser sees that it's an f-string, and breaks it into expression and text parts. 3. For each expression found, the expression is compiled with PyParser_ASTFromString(..., Py_eval_input, ...).
Step 2 is the part that limits what types of expressions are allowed. While scanning for the end of an expression, it stops at the first '!', ':', or '}' that isn't inside of a string and isn't nested inside of parens, braces, and brackets.
The nesting-tracking is why these work:
f'{(lambda x:3)}'
'<function <lambda> at 0x000000000296E560>'
f'{(lambda x:3)!s:.20}'
'<function <lambda> a'
But this doesn't:
f'{lambda x:3}'
File "<fstring>", line 1 (lambda x) ^ SyntaxError: unexpected EOF while parsing
Also, backslashes are not allowed anywhere inside of the expression. This was a late change right before beta 1 (I think), and differs from the PEP and docs. I have an open item to fix them.
I initially thought it would obviously be, but the PEP says that using a lamda inside the expression would conflict because of the colon (which wouldn't happen if a proper grammar was actually used for this parsing as there'd be no conflict as the lamda would properly consume the colon), so, I guess some pre-parser steps takes place to separate the expression to then be parsed, so, I'm interested on knowing how exactly that should work when the implementation is finished -- lots of plus points if there's actually a grammar to back it up :)
I've considered using the grammar and tokenizer to implement f-string parsing, but I doubt it will ever happen. It's a lot of work, and everything that produced or consumed tokens would have to be aware of it. As it stands, if you don't need to look inside of f-strings, you can just treat them as regular STRING tokens.
I hope that helps.
Eric.
On Fri, Nov 4, 2016 at 9:56 AM, Eric V. Smith eric@trueblade.com wrote:
- The parser sees that it's an f-string, and breaks it into expression and
text parts.
I'm with Fabio here. It would be really nice to have a grammar specified and documented for this step, even if it's not implemented that way. Otherwise it's going to be very hard for, e.g., syntax highlighters to know what is intended to be allowed.
On 4 November 2016 at 08:36, Simon Cross hodgestar+pythondev@gmail.com wrote:
On Fri, Nov 4, 2016 at 9:56 AM, Eric V. Smith eric@trueblade.com wrote:
- The parser sees that it's an f-string, and breaks it into expression and
text parts.
I'm with Fabio here. It would be really nice to have a grammar specified and documented for this step, even if it's not implemented that way. Otherwise it's going to be very hard for, e.g., syntax highlighters to know what is intended to be allowed.
I think that if the docs explain the process, essentially as noted by Eric above:
Step 2 is the part that limits what types of expressions are allowed. While scanning for the end of an expression, it stops at the first '!', ':', or '}' that isn't inside of a string and isn't nested inside of parens, braces, and brackets.
[...]
Also, backslashes are not allowed anywhere inside of the expression.
then that would be fine. Possibly a bit more detail would be helpful, but I'm pretty sure I could reimplement the behaviour (for a syntax highlighter, for example) based on the above.
I assume that the open item Eric mentions to fix the PEP and docs is sufficient to cover this, so it'll be documented in due course.
Paul
Answers inline...
On Fri, Nov 4, 2016 at 5:56 AM, Eric V. Smith eric@trueblade.com wrote:
On 11/3/2016 3:06 PM, Fabio Zadrozny wrote:
Hi Python-Dev,
I'm trying to get my head around on what's accepted in f-strings -- https://www.python.org/dev/peps/pep-0498/ seems very light on the details on what it does accept as an expression and how things should actually be parsed (and the current implementation still doesn't seem to be in a state for a final release, so, I thought asking on python-dev would be a reasonable option).
In what way do you think the implementation isn't ready for a final release?
Well, the cases listed in the docs (https://hg.python.org/ cpython/file/default/Doc/reference/lexical_analysis.rst) don't work in the latest release (with SyntaxErrors) -- and the bug I created related to it: http://bugs.python.org/issue28597 was promptly closed as duplicate -- so, I assumed (maybe wrongly?) that the parsing still needs work.
I was thinking there'd be some grammar for it (something as
https://docs.python.org/3.6/reference/grammar.html), but all I could find related to this is a quote saying that f-strings should be something as:
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text>
So, given that, is it safe to assume that <expression> would be equal to the "test" node from the official grammar?
No. There are really three phases here:
- The f-string is tokenized as a regular STRING token, like all other
strings (f-, b-, u-, r-, etc). 2. The parser sees that it's an f-string, and breaks it into expression and text parts. 3. For each expression found, the expression is compiled with PyParser_ASTFromString(..., Py_eval_input, ...).
Step 2 is the part that limits what types of expressions are allowed. While scanning for the end of an expression, it stops at the first '!', ':', or '}' that isn't inside of a string and isn't nested inside of parens, braces, and brackets.
It'd be nice if at least this description could be added to the PEP (as all other language implementations and IDEs will have to work the same way and will probably reference it) -- a grammar example, even if not used would be helpful (personally, I think hand-crafted parsers are always worse in the long run compared to having a proper grammar with a parser, although I understand that if you're not really used to it, it may be more work to set it up).
Also, I find it a bit troubling that PyParser_ASTFromString is used there and not just the node which would be related to an expression, although I understand it's probably an easier approach, although in the end you probably have to filter it and end up just accepting what's beneath the "test" from the grammar, no? (i.e.: that's what a lambda body accepts).
The nesting-tracking is why these work:
f'{(lambda x:3)}'
'<function <lambda> at 0x000000000296E560>'
f'{(lambda x:3)!s:.20}'
'<function <lambda> a'
But this doesn't:
f'{lambda x:3}'
File "<fstring>", line 1 (lambda x) ^ SyntaxError: unexpected EOF while parsing
Also, backslashes are not allowed anywhere inside of the expression. This was a late change right before beta 1 (I think), and differs from the PEP and docs. I have an open item to fix them.
I initially thought it would obviously be, but the PEP says that using a
lamda inside the expression would conflict because of the colon (which wouldn't happen if a proper grammar was actually used for this parsing as there'd be no conflict as the lamda would properly consume the colon), so, I guess some pre-parser steps takes place to separate the expression to then be parsed, so, I'm interested on knowing how exactly that should work when the implementation is finished -- lots of plus points if there's actually a grammar to back it up :)
I've considered using the grammar and tokenizer to implement f-string parsing, but I doubt it will ever happen. It's a lot of work, and everything that produced or consumed tokens would have to be aware of it. As it stands, if you don't need to look inside of f-strings, you can just treat them as regular STRING tokens.
Well, I think all language implementations / IDEs (or at least those which want to give syntax errors) will *have* to look inside f-strings.
Also, you could still have a separate grammar saying how to look inside f-strings (this would make the lives of other implementors easier) even if it was a post-processing step as you're doing now.
I hope that helps.
Eric.
It does, thank you very much.
Best Regards,
Fabio
On 11/4/2016 10:50 AM, Fabio Zadrozny wrote:
In what way do you think the implementation isn't ready for a final release?
Well, the cases listed in the docs (https://hg.python.org/cpython/file/default/Doc/reference/lexical_analysis.rs... https://hg.python.org/cpython/file/default/Doc/reference/lexical_analysis.rst) don't work in the latest release (with SyntaxErrors) -- and the bug I created related to it: http://bugs.python.org/issue28597 http://bugs.python.org/issue28597 was promptly closed as duplicate -- so, I assumed (maybe wrongly?) that the parsing still needs work.
It's not the parsing that needs work, it's the documentation. Those examples used to work, but the parser was deliberately changed to not support them. There's a long discussion on python-ideas about it, starting at https://mail.python.org/pipermail/python-ideas/2016-August/041727.html
It'd be nice if at least this description could be added to the PEP (as all other language implementations and IDEs will have to work the same way and will probably reference it) -- a grammar example, even if not used would be helpful (personally, I think hand-crafted parsers are always worse in the long run compared to having a proper grammar with a parser, although I understand that if you're not really used to it, it may be more work to set it up).
I've written a parser generator just to understand how they work, so I'm completely sympathetic to this. However, in this case, I don't think it would be any easier. I'm basically writing a tokenizer, not an expression parser. It's much simpler. The actual parsing is handled by PyParser_ASTFromString. And as I state below, you have to also consider the parser consumers.
Also, I find it a bit troubling that PyParser_ASTFromString is used there and not just the node which would be related to an expression, although I understand it's probably an easier approach, although in the end you probably have to filter it and end up just accepting what's beneath the "test" from the grammar, no? (i.e.: that's what a lambda body accepts).
Using PyParser_ASTFromString is the easiest possible way to do this. Given a string, it returns an AST node. What could be simpler?
Well, I think all language implementations / IDEs (or at least those which want to give syntax errors) will *have* to look inside f-strings.
While it's probably true that IDEs (and definitely language implementations) will want to parse f-strings, I think there are many more code scanners that are not language implementations or IDEs. And by being "just" regular strings with a new prefix, it's trivial to get any parser that doesn't care about the internal structure to at least see f-strings as normal strings.
Also, you could still have a separate grammar saying how to look inside f-strings (this would make the lives of other implementors easier) even if it was a post-processing step as you're doing now.
Yes. I've contemplated exposing the f-string scanner. That's the part that returns expressions (as strings) and literal strings. I realize that won't help 3.6.
Eric.
On Fri, Nov 4, 2016 at 3:15 PM, Eric V. Smith eric@trueblade.com wrote:
On 11/4/2016 10:50 AM, Fabio Zadrozny wrote:
In what way do you think the implementation isn't ready for a final release?
Well, the cases listed in the docs (https://hg.python.org/cpython/file/default/Doc/reference/ lexical_analysis.rst https://hg.python.org/cpython/file/default/Doc/reference/ lexical_analysis.rst) don't work in the latest release (with SyntaxErrors) -- and the bug I created related to it: http://bugs.python.org/issue28597 http://bugs.python.org/issue28597 was promptly closed as duplicate -- so, I assumed (maybe wrongly?) that the parsing still needs work.
It's not the parsing that needs work, it's the documentation. Those examples used to work, but the parser was deliberately changed to not support them. There's a long discussion on python-ideas about it, starting at https://mail.python.org/pipermail/python-ideas/2016-August/041727.html
Understood ;)
It'd be nice if at least this description could be added to the PEP (as
all other language implementations and IDEs will have to work the same way and will probably reference it) -- a grammar example, even if not used would be helpful (personally, I think hand-crafted parsers are always worse in the long run compared to having a proper grammar with a parser, although I understand that if you're not really used to it, it may be more work to set it up).
I've written a parser generator just to understand how they work, so I'm completely sympathetic to this. However, in this case, I don't think it would be any easier. I'm basically writing a tokenizer, not an expression parser. It's much simpler. The actual parsing is handled by PyParser_ASTFromString. And as I state below, you have to also consider the parser consumers.
Also, I find it a bit troubling that
PyParser_ASTFromString is used there and not just the node which would be related to an expression, although I understand it's probably an easier approach, although in the end you probably have to filter it and end up just accepting what's beneath the "test" from the grammar, no? (i.e.: that's what a lambda body accepts).
Using PyParser_ASTFromString is the easiest possible way to do this. Given a string, it returns an AST node. What could be simpler?
I think that for implementation purposes, given the python infrastructure, it's fine, but for specification purposes, probably incorrect... As I don't think f-strings should accept:
f"start {import sys; sys.version_info[0];} end" (i.e.: PyParser_ASTFromString doesn't just return an expression, it accepts any valid Python code, even code which can't be used in an f-string).
Well, I think all language implementations / IDEs (or at least those
which want to give syntax errors) will *have* to look inside f-strings.
While it's probably true that IDEs (and definitely language implementations) will want to parse f-strings, I think there are many more code scanners that are not language implementations or IDEs. And by being "just" regular strings with a new prefix, it's trivial to get any parser that doesn't care about the internal structure to at least see f-strings as normal strings.
Also, you could still have a separate grammar saying how to look inside
f-strings (this would make the lives of other implementors easier) even if it was a post-processing step as you're doing now.
Yes. I've contemplated exposing the f-string scanner. That's the part that returns expressions (as strings) and literal strings. I realize that won't help 3.6.
Nice...
As a note, just for the record, my own interest on f-strings is knowing how exactly they are parsed for providing a preview of PyDev with syntax highlighting and preliminary support for f-strings (which at the very minimum besides syntax highlighting for the parts of f-strings should also show syntax errors inside them).
Cheers,
Fabio
On 11/4/2016 2:03 PM, Fabio Zadrozny wrote:
Using PyParser_ASTFromString is the easiest possible way to do this. Given a string, it returns an AST node. What could be simpler?
I think that for implementation purposes, given the python infrastructure, it's fine, but for specification purposes, probably incorrect... As I don't think f-strings should accept:
f"start {import sys; sys.version_info[0];} end" (i.e.: PyParser_ASTFromString doesn't just return an expression, it accepts any valid Python code, even code which can't be used in an f-string).
Not so. It should only accept expressions, not statements:
f"start {import sys; sys.version_info[0];} end"
File "<fstring>", line 1 (import sys; sys.version_info[0];) ^ SyntaxError: invalid syntax
Also, you could still have a separate grammar saying how to look inside f-strings (this would make the lives of other implementors easier) even if it was a post-processing step as you're doing now. Yes. I've contemplated exposing the f-string scanner. That's the part that returns expressions (as strings) and literal strings. I realize that won't help 3.6.
Nice...
As a note, just for the record, my own interest on f-strings is knowing how exactly they are parsed for providing a preview of PyDev with syntax highlighting and preliminary support for f-strings (which at the very minimum besides syntax highlighting for the parts of f-strings should also show syntax errors inside them).
I understand there's a need to make the specification more rigorous. Hopefully we'll get there.
Eric.
On 5 November 2016 at 04:03, Fabio Zadrozny fabiofz@gmail.com wrote:
On Fri, Nov 4, 2016 at 3:15 PM, Eric V. Smith eric@trueblade.com wrote:
Using PyParser_ASTFromString is the easiest possible way to do this. Given a string, it returns an AST node. What could be simpler?
I think that for implementation purposes, given the python infrastructure, it's fine, but for specification purposes, probably incorrect... As I don't think f-strings should accept:
f"start {import sys; sys.version_info[0];} end" (i.e.: PyParser_ASTFromString doesn't just return an expression, it accepts any valid Python code, even code which can't be used in an f-string).
f-strings use the "eval" parsing mode, which starts from the "eval_input" node in the grammar (which is only a couple of nodes higher than 'test', allowing tuples via 'testlist' as well as trailing newlines and EOF):
>>> ast.parse("import sys; sys.version_info[0];", mode="eval") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.5/ast.py", line 35, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<example>", line 1 import sys; sys.version_info[0]; ^ SyntaxError: invalid syntax
You have to use "exec" mode to get the parser to allow statements, which is why f-strings don't do that:
>>> ast.dump(ast.parse("import sys; sys.version_info[0];", mode="exec")) "Module(body=[Import(names=[alias(name='sys', asname=None)]), Expr(value=Subscript(value=Attribute(value=Name(id='sys', ctx=Load()), attr='version_info', ctx=Load()), slice=Index(value=Num(n=0)), ctx=Load()))])"
The unique aspect for f-strings that means they don't permit some otherwise valid Python expressions is that it also does the initial pre-tokenisation based on:
1. Look for an opening '{' 2. Look for a closing '!', ':' or '}' accounting for balanced string quotes, parentheses, brackets and braces
Ignoring the surrounding quotes, and using the `atom` node from Python's grammar to represent the nesting tracking, and TEXT to stand in for arbitrary text, it's something akin to:
fstring: (TEXT ['{' maybe_pyexpr ('!' | ':' | '}')])+ maybe_pyexpr: (atom | TEXT)+
That isn't quite right, since it doesn't properly account for brace nesting, but it gives the general idea - there's an initial really simple tokenising pass that picks out the potential Python expressions, and then those are run through the AST parser's equivalent of eval().
Cheers, Nick.
On Sat, Nov 5, 2016 at 10:36 AM, Nick Coghlan ncoghlan@gmail.com wrote:
On 5 November 2016 at 04:03, Fabio Zadrozny fabiofz@gmail.com wrote:
On Fri, Nov 4, 2016 at 3:15 PM, Eric V. Smith eric@trueblade.com
wrote:
Using PyParser_ASTFromString is the easiest possible way to do this.
Given
a string, it returns an AST node. What could be simpler?
I think that for implementation purposes, given the python
infrastructure,
it's fine, but for specification purposes, probably incorrect... As I
don't
think f-strings should accept:
f"start {import sys; sys.version_info[0];} end" (i.e.: PyParser_ASTFromString doesn't just return an expression, it accepts any valid Python code, even code which can't be used in an f-string).
f-strings use the "eval" parsing mode, which starts from the "eval_input" node in the grammar (which is only a couple of nodes higher than 'test', allowing tuples via 'testlist' as well as trailing newlines and EOF):
>>> ast.parse("import sys; sys.version_info[0];", mode="eval") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.5/ast.py", line 35, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<example>", line 1 import sys; sys.version_info[0]; ^ SyntaxError: invalid syntax
You have to use "exec" mode to get the parser to allow statements, which is why f-strings don't do that:
>>> ast.dump(ast.parse("import sys; sys.version_info[0];",
mode="exec")) "Module(body=[Import(names=[alias(name='sys', asname=None)]), Expr(value=Subscript(value=Attribute(value=Name(id='sys', ctx=Load()), attr='version_info', ctx=Load()), slice=Index(value=Num(n=0)), ctx=Load()))])"
The unique aspect for f-strings that means they don't permit some otherwise valid Python expressions is that it also does the initial pre-tokenisation based on:
- Look for an opening '{'
- Look for a closing '!', ':' or '}' accounting for balanced string
quotes, parentheses, brackets and braces
Ignoring the surrounding quotes, and using the `atom` node from Python's grammar to represent the nesting tracking, and TEXT to stand in for arbitrary text, it's something akin to:
fstring: (TEXT ['{' maybe_pyexpr ('!' | ':' | '}')])+ maybe_pyexpr: (atom | TEXT)+
That isn't quite right, since it doesn't properly account for brace nesting, but it gives the general idea - there's an initial really simple tokenising pass that picks out the potential Python expressions, and then those are run through the AST parser's equivalent of eval().
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi Nick and Eric,
Just wanted to say thanks for the feedback and point to a grammar I ended up doing on my side (in JavaCC), just in case someone else decides to do a formal grammar later on it can probably be used as a reference (shouldn't be hard to convert it to a bnf grammar):
https://github.com/fabioz/Pydev/blob/master/plugins/org.python.pydev.parser/...
Also, as a feedback, I found it a bit odd that there can't be any space nor new line between the last format specifiers and '}'
I.e.:
f'''{ dict( a = 10 ) !r } '''
is not valid, whereas
f'''{ dict( a = 10 ) !r} ''' is valid -- as a note, this means my grammar has a bug as both versions are accepted -- and I currently don't care enough about that change from the implementation to fix it ;)
Cheers,
Fabio
On 9 November 2016 at 16:20, Fabio Zadrozny fabiofz@gmail.com wrote:
Also, as a feedback, I found it a bit odd that there can't be any space nor new line between the last format specifiers and '}'
FWIW, that is the case for normal format strings, as well:
print("{!r\n}".format(12))
Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: expected ':' after conversion specifier
I guess the behaviour is simply inherited from there.
Paul
On 11/9/2016 11:35 AM, Paul Moore wrote:
On 9 November 2016 at 16:20, Fabio Zadrozny fabiofz@gmail.com wrote:
Also, as a feedback, I found it a bit odd that there can't be any space nor new line between the last format specifiers and '}'
FWIW, that is the case for normal format strings, as well:
print("{!r\n}".format(12))
Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: expected ':' after conversion specifier
I guess the behaviour is simply inherited from there.
Right. That and the fact that whitespace is significant inside the format specifier portion:
'{:%H:%M:%S\n}'.format(datetime.datetime.now())
'17:14:10\n'
I don't think it's worth changing this to allow whitespace after the optional one-character conversion flag.
Eric.