[Python-Dev] PEP-498: Literal String Formatting
Steve Dower
steve.dower at python.org
Tue Aug 18 00:06:14 CEST 2015
On 17Aug2015 0813, Barry Warsaw wrote:
> On Aug 18, 2015, at 12:58 AM, Chris Angelico wrote:
>
>> The linters could tell you that you have no 'end' or 'start' just as
>> easily when it's in that form as when it's written out in full.
>> Certainly the mismatched brackets could easily be caught by any sort
>> of syntax highlighter. The rules for f-strings are much simpler than,
>> say, the PHP rules and the differences between ${...} and {$...},
>> which I've seen editors get wrong.
>
> I'm really asking whether it's technically feasible and realistically possible
> for them to do so. I'd love to hear from the maintainers of pyflakes, pylint,
> Emacs, vim, and other editors, linters, and other static analyzers on a rough
> technical assessment of whether they can support this and how much work it
> would be.
With the current format string proposals (allowing arbitrary
expressions) I think I'd implement it in our parser with a
FORMAT_STRING_TOKEN, a FORMAT_STRING_JOIN_OPERATOR and a
FORMAT_STRING_FORMAT_OPERATOR.
A FORMAT_STRING_TOKEN would be started by f('|"|'''|""") and ended by
matching quotes or before an open brace (that is not escaped).
A FORMAT_STRING_JOIN_OPERATOR would be inserted as the '{', which we'd
either colour as part of the string or the regular brace colour. This
also enables a parsing context where a colon becomes the
FORMAT_STRING_FORMAT_OPERATOR and the right-hand side of this binary
operator would be FORMAT_STRING_TOKEN. The final close brace becomes
another FORMAT_STRING_JOIN_OPERATOR and the rest of the string is
FORMAT_STRING_TOKEN.
So it'd translate something like this:
f"This {text} is my {string:>{length+3}}"
FORMAT_STRING_TOKEN[f"This ]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[text]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN[ is my ]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[string]
FORMAT_STRING_FORMAT_OPERATOR[:]
FORMAT_STRING_TOKEN[>]
FORMAT_STRING_JOIN_OPERATOR[{]
IDENTIFIER[length]
OPERATOR[+]
NUMBER[3]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN[]
FORMAT_STRING_JOIN_OPERATOR[}]
FORMAT_STRING_TOKEN["]
I *believe* (without having tried it) that this would let us produce a
valid tokenisation (in our model) without too much difficulty, and
highlight/analyse correctly, including validating matching braces.
Getting the precedence correct on the operators might be more difficult,
but we may also just produce an AST that looks like a function call,
since that will give us "good enough" handling once we're past tokenisation.
A simpler tokenisation that would probably be sufficient for many
editors would be to treat the first and last segments ([f"This {] and
[}"]) as groupings and each section of text as separators, giving this:
OPEN_GROUPING[f"This {]
EXPRESSION[text]
COMMA[} is my {]
EXPRESSION[string]
COMMA[:>{]
EXPRESSION[length+3]
COMMA[}}]
CLOSE_GROUPING["]
Initial parsing may be a little harder, but it should mean less trouble
when expressions spread across multiple lines, since that is already
handled for other types of groupings. And if any code analysis is
occurring, it should be happening for dict/list/etc. contents already,
and so format strings will get it too.
So I'm confident we can support it, and I expect either of these two
approaches will work for most tools without too much trouble. (There's
also a middle ground where you create new tokens for format string
components, but combine them like the second example.)
Cheers,
Steve
> Cheers,
> -Barry
More information about the Python-Dev
mailing list