[Python-Dev] PEP-498: Literal String Formatting

Tue Aug 18 00:06:14 CEST 2015

On 17Aug2015 0813, Barry Warsaw wrote:
> On Aug 18, 2015, at 12:58 AM, Chris Angelico wrote:
>
>> The linters could tell you that you have no 'end' or 'start' just as
>> easily when it's in that form as when it's written out in full.
>> Certainly the mismatched brackets could easily be caught by any sort
>> of syntax highlighter. The rules for f-strings are much simpler than,
>> say, the PHP rules and the differences between ${...} and {$...},
>> which I've seen editors get wrong.
>
> I'm really asking whether it's technically feasible and realistically possible
> for them to do so.  I'd love to hear from the maintainers of pyflakes, pylint,
> Emacs, vim, and other editors, linters, and other static analyzers on a rough
> technical assessment of whether they can support this and how much work it
> would be.

With the current format string proposals (allowing arbitrary 
expressions) I think I'd implement it in our parser with a 
FORMAT_STRING_TOKEN, a FORMAT_STRING_JOIN_OPERATOR and a 
FORMAT_STRING_FORMAT_OPERATOR.

A FORMAT_STRING_TOKEN would be started by f('|"|'''|""") and ended by 
matching quotes or before an open brace (that is not escaped).

A FORMAT_STRING_JOIN_OPERATOR would be inserted as the '{', which we'd 
either colour as part of the string or the regular brace colour. This 
also enables a parsing context where a colon becomes the 
FORMAT_STRING_FORMAT_OPERATOR and the right-hand side of this binary 
operator would be FORMAT_STRING_TOKEN. The final close brace becomes 
another FORMAT_STRING_JOIN_OPERATOR and the rest of the string is 
FORMAT_STRING_TOKEN.

So it'd translate something like this:

f"This {text} is my {string:>{length+3}}"

FORMAT_STRING_TOKEN[f"This ]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[text]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN[ is my ]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[string]
FORMAT_STRING_FORMAT_OPERATOR[:]
FORMAT_STRING_TOKEN[>]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[length]
OPERATOR[+]
NUMBER[3]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN[]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN["]

I *believe* (without having tried it) that this would let us produce a 
valid tokenisation (in our model) without too much difficulty, and 
highlight/analyse correctly, including validating matching braces. 
Getting the precedence correct on the operators might be more difficult, 
but we may also just produce an AST that looks like a function call, 
since that will give us "good enough" handling once we're past tokenisation.

A simpler tokenisation that would probably be sufficient for many 
editors would be to treat the first and last segments ([f"This {] and 
[}"]) as groupings and each section of text as separators, giving this:

OPEN_GROUPING[f"This {]
EXPRESSION[text]
COMMA[} is my {]
EXPRESSION[string]
COMMA[:>{]
EXPRESSION[length+3]
COMMA[}}]
CLOSE_GROUPING["]

Initial parsing may be a little harder, but it should mean less trouble 
when expressions spread across multiple lines, since that is already 
handled for other types of groupings. And if any code analysis is 
occurring, it should be happening for dict/list/etc. contents already, 
and so format strings will get it too.

So I'm confident we can support it, and I expect either of these two 
approaches will work for most tools without too much trouble. (There's 
also a middle ground where you create new tokens for format string 
components, but combine them like the second example.)

Cheers,
Steve

> Cheers,
> -Barry