[Python-Dev] Parsing f-strings from PEP 498 -- Literal String Interpolation

Fri Nov 4 03:56:35 EDT 2016

On 11/3/2016 3:06 PM, Fabio Zadrozny wrote:
> Hi Python-Dev,
>
> I'm trying to get my head around on what's accepted in f-strings --
> https://www.python.org/dev/peps/pep-0498/ seems very light on the
> details on what it does accept as an expression and how things should
> actually be parsed (and the current implementation still doesn't seem to
> be in a state for a final release, so, I thought asking on python-dev
> would be a reasonable option).

In what way do you think the implementation isn't ready for a final release?

> I was thinking there'd be some grammar for it (something as
> https://docs.python.org/3.6/reference/grammar.html), but all I could
> find related to this is a quote saying that f-strings should be
> something as:
>
> f ' <text> { <expression> <optional !s, !r, or !a> <optional : format
> specifier> } <text>
>
> So, given that, is it safe to assume that <expression> would be equal to
> the "test" node from the official grammar?

No. There are really three phases here:

1. The f-string is tokenized as a regular STRING token, like all other 
strings (f-, b-, u-, r-, etc).
2. The parser sees that it's an f-string, and breaks it into expression 
and text parts.
3. For each expression found, the expression is compiled with 
PyParser_ASTFromString(..., Py_eval_input, ...).

Step 2 is the part that limits what types of expressions are allowed. 
While scanning for the end of an expression, it stops at the first '!', 
':', or '}' that isn't inside of a string and isn't nested inside of 
parens, braces, and brackets.

The nesting-tracking is why these work:
 >>> f'{(lambda x:3)}'
'<function <lambda> at 0x000000000296E560>'
 >>> f'{(lambda x:3)!s:.20}'
'<function <lambda> a'

But this doesn't:
 >>> f'{lambda x:3}'
   File "<fstring>", line 1
     (lambda x)
              ^
SyntaxError: unexpected EOF while parsing

Also, backslashes are not allowed anywhere inside of the expression. 
This was a late change right before beta 1 (I think), and differs from 
the PEP and docs. I have an open item to fix them.

> I initially thought it would obviously be, but the PEP says that using a
> lamda inside the expression would conflict because of the colon (which
> wouldn't happen if a proper grammar was actually used for this parsing
> as there'd be no conflict as the lamda would properly consume the
> colon), so, I guess some pre-parser steps takes place to separate the
> expression to then be parsed, so, I'm interested on knowing how exactly
> that should work when the implementation is finished -- lots of plus
> points if there's actually a grammar to back it up :)

I've considered using the grammar and tokenizer to implement f-string 
parsing, but I doubt it will ever happen. It's a lot of work, and 
everything that produced or consumed tokens would have to be aware of 
it. As it stands, if you don't need to look inside of f-strings, you can 
just treat them as regular STRING tokens.

I hope that helps.

Eric.