Re: [Python-ideas] Let’s make escaping in f-literals impossible

Aug. 19, 2016

      On 8/18/2016 11:05 AM, Philipp A. wrote:
...
Hi, I originially posted this via google groups, which didn’t make it
through to the list proper, sorry! Read it here please:
https://groups.google.com/forum/#!topic/python-ideas/V1U6DGL5J1s
Hi, Philipp.

I'm including your original proposal here, so that it's archived
properly. Here's what you'd like to have work:
f'foo{repr('bar\n') * 3}baz' == 'foo"bar\n""bar\n""bar\n"baz'

We've had this discussion before, but I can't find it in the
archives. It's possible it happened off-list.

The problem I have with your proposal is that it greatly complicates
the implementation and it makes it harder for humans to reason about
the code (for the same reasons). This is not just an ease of 
implementation issue: it's a cognitive burden issue.

As it currently stands, Python strings, of all 'flavors' (raw, unicode, 
binary, and f-), are parsed the same way: an optional prefix, one or 
three quote characters, the body of the string, then matching quote 
characters. This is sufficiently simple that it can be (and often is) 
implemented with regular expressions. You also need to support 
combinations of prefixes, like raw f-strings (fr or rf).

With your proposal, it's much more difficult to find the end of an
f-string. I do not think this is a reasonable requirement.

For example, consider the following:
f'a{func({'a{':1,'3}':[{}, ')', '{']})}b'

A regex will not be able to deal with the matching braces needed to
find the end of the expression. You need to keep track of the nesting
level of parens, braces, brackets, and quotes (at least those, I might
have left something off).

The way this currently is written in Python 3.6a4:
f"a{func({'a{':1,'3}':[{}, ')', '{']})}b"

It's trivially easy to find the end of the string. It's easy for both
humans and the parsers.

Now admittedly in order to execute or syntax highlight the existing
f-strings, you need to perform this parsing. But of the 3 parsers that
ship with Python (ast.c, tokenize.py, IDLE), only ast.c needs to do
that currently. I don't think tokenize.py ever will, and IDLE might
(but could use the ast module).

I think many parsers (e.g. python-mode.el, etc.) will just be able to 
simply consume the f-strings without looking inside them and move one, 
and we shouldn't unnecessarily complicate them.
...
My arguments are basically:
1. f-literals are semantically not strings, but expressions.
 2. Their escape sequences in the code parts are fundamentally both
    detrimental and superfluous (they’re only in for convenience, as
    confirmed by Guido in the quote below)
I disagree that they're detrimental and superfluous. I'd say they're 
consistent with all other strings.
...
3. They’re detrimental because Syntax highlighters are (by design)
    unable to handle this part of Python 3.6a4’s grammar. This will
    cause code to be highlighted as parts of a string and therefore
    overlooked. i’m very sure this will cause bugs.
I disagree.
...
4. The fact that people see the embedded expressions as somehow “part
    of the string” is confusing.
My poposal is to redo their grammar:
They shouldn’t be parsed as strings and post-processed, but be their own
thing. This also opens the door to potentially extend to with something
like JavaScript’s tagged templates)
Without the limitations of the string tokenization code/rules, only the
string parts would have escape sequences, and the expression parts would
be regular python code (“holes” in the literal).
Below the mentioned quote and some replies to the original thread:
Guido van Rossum <guido@python.org <mailto:guido@python.org>> schrieb am
Mi., 17. Aug. 2016 um 20:11 Uhr:
The explanation is honestly that the current approach is the most
    straightforward for the implementation (it's pretty hard to
    intercept the string literal before escapes have been processed) and
    nobody cares enough about the edge cases to force the implementation
    to jump through more hoops.
I really don't think this discussion should be reopened. If you
    disagree, please start a new thread on python-ideas.
I really think it should. Please look at python code with f-literals. if
they’re highlighted as strings throughout, you won’t be able to spot
which parts are code. if they’re highlighted as code, the escaping rules
guarantee that most highlighters can’t correctly highlight python
anymore. i think that’s a big issue for readability.
Maybe I'm the odd man out, but I really don't care if my editor ever 
syntax highlights within f-strings. I don't plan on putting anything 
more complicated than variable names in my f-strings, and I think PEP 8 
should recommend something similar.
...
Brett Cannon <brett@python.org <mailto:brett@python.org>> schrieb am
Mi., 17. Aug. 2016 um 20:28 Uhr:
They are still strings, there is just post-processing on the string
    itself to do the interpolation.
Sounds hacky to me. I’d rather see a proper parser for them, which of
course would make my vision easy.
You're saying if such a parser existed, it would be easy to use it to 
parse your version of f-strings? True enough! And as Nick points out, 
such a thing already exists in the ast module. But if your proposal were 
accepted, using such an approach would not be optional (only if you care 
about the inside of an f-string), it would be required to parse any 
python code even if you don't care about the contents of f-strings.
...
By doing it this way the implementation can use Python itself to do
    the tokenizing of the string, while if you do the string
    interpolation beforehand you would then need to do it entirely at
    the C level which is very messy and painful since you're explicitly
    avoiding Python's automatic handling of Unicode, etc.
of course we reuse the tokenization for the string parts. as said, you
can view an f-literal as interleaved sequence of strings and expressions
with an attached format specification.
<f'> starts the f-literal, string contents follow. the only difference
to other strings is
<{> which starts expression tokenization. once the expression ends, an
optional
<formatspec> follows, then a
<}> to switch back to string tokenization
this repeats until (in string parsing mode) a
<'> is encountered which ends the f-literal.
You also make it harder to work with Unicode-based variable names
    (or at least explain it). If you have Unicode in a variable name but
    you can't use \N{} in the string to help express it you then have to
    say "normal Unicode support in the string applies everywhere *but*
    in the string interpolation part".
i think you’re just proving my point that the way f-literals work now is
confusing.
the embedded expressions are just normal python. the embedded strings
just normal strings. you can simply switch between both using <{> and
<[format]}>.
unicode in variable names works exactly the same as in all other python
code because it is regular python code.
Or another reason is you can explain f-strings as "basically
    str.format_map(**locals(), **globals()), but without having to make
    the actual method call" (and worrying about clashing keys but I
    couldn't think of a way of using dict.update() in a single line).
    But with your desired change it kills this explanation by saying
    f-strings aren't like this but some magical string that does all of
    this stuff before normal string normalization occurs.
no, it’s simply the expression parts (that for normal formatting are
inside of the braces of  .format(...)) are *interleaved* in between
string parts. they’re not part of the string. just regular plain python
code.
I think that's our disagreement. I do see them as part of the string.
...
Cheers, and i really hope i’ve made a strong case,
Thanks for your concern and your comments. But you've not swayed me.
...
philipp
Eric.

Re: [Python-ideas] Let’s make escaping in f-literals impossible

Eric V. Smith