[Python-Dev] PEP 498: Literal String Interpolation is ready for pronouncement

Sat Sep 5 22:00:06 CEST 2015

On 9/5/2015 3:23 PM, Nathaniel Smith wrote:
> On Sep 5, 2015 11:32 AM, "Eric V. Smith" <eric at trueblade.com
> <mailto:eric at trueblade.com>> wrote:
>> Ignore the part about non-doubled '}'. The actual description is:
>>
>> To find the end of an expression, it looks for a '!', ':', or '}', not
>> inside of a string or (), [], or {}. There's a special case for '!=' so
>> the bang isn't seen as ending the expression.
> 
> Sounds like you're reimplementing a lot of the lexer... I guess that's
> doable, but how confident are you that your definition of "inside a
> string" matches the original in all corner cases?

Well, this is 35 lines of code (including comments), and it's much
simpler than a lexer (in the sense of "something that generates
tokens"). So I don't think I'm reimplementing a lot of the lexer.

However, your point is valid: if I don't do the same thing the lexer
would do, I could either prematurely find the end of an expression, or
look too far. In either case, when I call ast.parse() I'll get a syntax
error, and/or I'll get an error when parsing/lexing the remainder of the
string.

But it's not like I have to agree with the lexer: no larger error will
occur if I get it wrong. Everything is confined to a single f-string,
since I've already used the lexer to find the f-string in its entirety.
I only need to make sure the users understand how expressions are
extracted from f-strings.

I did look at using the actual lexer (Parser/tokenizer.c) to do this,
but it would require a large amount of surgery. I think it's overkill
for this task.

So far, I've tested it enough to have reasonable confidence that it's
correct. But the implementation could always be swapped out for an
improved version. I'm certainly open to that, if we find cases that the
simple scanner can't deal with.

> In any case the abstract language definition part should be phrased in
> terms of the python lexer -- the expression ends when you encounter the
> first } *token* that is not nested inside () [] {} *tokens*, and then
> you can implement it however makes sense...

I'm not sure that's an improvement on Guido's description when you're
trying to explain it to a user. But when time comes to write the
documentation, we can discuss it then.

> (This is then the same rule that patsy uses to find the end of python
> expressions embedded inside patsy formula strings: patsy.readthedocs.org
> <http://patsy.readthedocs.org>)

I don't see where patsy looks for expressions in parts of strings. Let
me know if I'm missing it.

Eric.