[Python-Dev] PEP-498: Literal String Formatting

Tue Aug 18 01:08:11 CEST 2015

On 17Aug2015 1506, Steve Dower wrote:
> On 17Aug2015 0813, Barry Warsaw wrote:
>> On Aug 18, 2015, at 12:58 AM, Chris Angelico wrote:
>>
>>> The linters could tell you that you have no 'end' or 'start' just as
>>> easily when it's in that form as when it's written out in full.
>>> Certainly the mismatched brackets could easily be caught by any sort
>>> of syntax highlighter. The rules for f-strings are much simpler than,
>>> say, the PHP rules and the differences between ${...} and {$...},
>>> which I've seen editors get wrong.
>>
>> I'm really asking whether it's technically feasible and realistically
>> possible
>> for them to do so.  I'd love to hear from the maintainers of pyflakes,
>> pylint,
>> Emacs, vim, and other editors, linters, and other static analyzers on
>> a rough
>> technical assessment of whether they can support this and how much
>> work it
>> would be.
>
> With the current format string proposals (allowing arbitrary
> expressions) I think I'd implement it in our parser with a
> FORMAT_STRING_TOKEN, a FORMAT_STRING_JOIN_OPERATOR and a
> FORMAT_STRING_FORMAT_OPERATOR.
>
> A FORMAT_STRING_TOKEN would be started by f('|"|'''|""") and ended by
> matching quotes or before an open brace (that is not escaped).
>
> A FORMAT_STRING_JOIN_OPERATOR would be inserted as the '{', which we'd
> either colour as part of the string or the regular brace colour. This
> also enables a parsing context where a colon becomes the
> FORMAT_STRING_FORMAT_OPERATOR and the right-hand side of this binary
> operator would be FORMAT_STRING_TOKEN. The final close brace becomes
> another FORMAT_STRING_JOIN_OPERATOR and the rest of the string is
> FORMAT_STRING_TOKEN.
>
> So it'd translate something like this:
>
> f"This {text} is my {string:>{length+3}}"
>
> FORMAT_STRING_TOKEN[f"This ]
> FORMAT_STRING_JOIN_OPERATOR[{]
> IDENTIFIER[text]
> FORMAT_STRING_JOIN_OPERATOR[}]
> FORMAT_STRING_TOKEN[ is my ]
> FORMAT_STRING_JOIN_OPERATOR[{]
> IDENTIFIER[string]
> FORMAT_STRING_FORMAT_OPERATOR[:]
> FORMAT_STRING_TOKEN[>]
> FORMAT_STRING_JOIN_OPERATOR[{]
> IDENTIFIER[length]
> OPERATOR[+]
> NUMBER[3]
> FORMAT_STRING_JOIN_OPERATOR[}]
> FORMAT_STRING_TOKEN[]
> FORMAT_STRING_JOIN_OPERATOR[}]
> FORMAT_STRING_TOKEN["]
>
> I *believe* (without having tried it) that this would let us produce a
> valid tokenisation (in our model) without too much difficulty, and
> highlight/analyse correctly, including validating matching braces.
> Getting the precedence correct on the operators might be more difficult,
> but we may also just produce an AST that looks like a function call,
> since that will give us "good enough" handling once we're past
> tokenisation.
>
> A simpler tokenisation that would probably be sufficient for many
> editors would be to treat the first and last segments ([f"This {] and
> [}"]) as groupings and each section of text as separators, giving this:
>
> OPEN_GROUPING[f"This {]
> EXPRESSION[text]
> COMMA[} is my {]
> EXPRESSION[string]
> COMMA[:>{]
> EXPRESSION[length+3]
> COMMA[}}]
> CLOSE_GROUPING["]
>
> Initial parsing may be a little harder, but it should mean less trouble
> when expressions spread across multiple lines, since that is already
> handled for other types of groupings. And if any code analysis is
> occurring, it should be happening for dict/list/etc. contents already,
> and so format strings will get it too.
>
> So I'm confident we can support it, and I expect either of these two
> approaches will work for most tools without too much trouble. (There's
> also a middle ground where you create new tokens for format string
> components, but combine them like the second example.)

The middle ground would probably be required for correct highlighting. I 
implied but didn't specify that the tokens in my second example would 
get special treatment here.

> Cheers,
> Steve
>
>> Cheers,
>> -Barry