[Python-Dev] Parsing f-strings from PEP 498 -- Literal String Interpolation
Eric V. Smith
eric at trueblade.com
Fri Nov 4 03:56:35 EDT 2016
On 11/3/2016 3:06 PM, Fabio Zadrozny wrote:
> Hi Python-Dev,
>
> I'm trying to get my head around on what's accepted in f-strings --
> https://www.python.org/dev/peps/pep-0498/ seems very light on the
> details on what it does accept as an expression and how things should
> actually be parsed (and the current implementation still doesn't seem to
> be in a state for a final release, so, I thought asking on python-dev
> would be a reasonable option).
In what way do you think the implementation isn't ready for a final release?
> I was thinking there'd be some grammar for it (something as
> https://docs.python.org/3.6/reference/grammar.html), but all I could
> find related to this is a quote saying that f-strings should be
> something as:
>
> f ' <text> { <expression> <optional !s, !r, or !a> <optional : format
> specifier> } <text>
>
> So, given that, is it safe to assume that <expression> would be equal to
> the "test" node from the official grammar?
No. There are really three phases here:
1. The f-string is tokenized as a regular STRING token, like all other
strings (f-, b-, u-, r-, etc).
2. The parser sees that it's an f-string, and breaks it into expression
and text parts.
3. For each expression found, the expression is compiled with
PyParser_ASTFromString(..., Py_eval_input, ...).
Step 2 is the part that limits what types of expressions are allowed.
While scanning for the end of an expression, it stops at the first '!',
':', or '}' that isn't inside of a string and isn't nested inside of
parens, braces, and brackets.
The nesting-tracking is why these work:
>>> f'{(lambda x:3)}'
'<function <lambda> at 0x000000000296E560>'
>>> f'{(lambda x:3)!s:.20}'
'<function <lambda> a'
But this doesn't:
>>> f'{lambda x:3}'
File "<fstring>", line 1
(lambda x)
^
SyntaxError: unexpected EOF while parsing
Also, backslashes are not allowed anywhere inside of the expression.
This was a late change right before beta 1 (I think), and differs from
the PEP and docs. I have an open item to fix them.
> I initially thought it would obviously be, but the PEP says that using a
> lamda inside the expression would conflict because of the colon (which
> wouldn't happen if a proper grammar was actually used for this parsing
> as there'd be no conflict as the lamda would properly consume the
> colon), so, I guess some pre-parser steps takes place to separate the
> expression to then be parsed, so, I'm interested on knowing how exactly
> that should work when the implementation is finished -- lots of plus
> points if there's actually a grammar to back it up :)
I've considered using the grammar and tokenizer to implement f-string
parsing, but I doubt it will ever happen. It's a lot of work, and
everything that produced or consumed tokens would have to be aware of
it. As it stands, if you don't need to look inside of f-strings, you can
just treat them as regular STRING tokens.
I hope that helps.
Eric.
More information about the Python-Dev
mailing list