Mailman 3 f-strings in the grammar - Python-Dev

newer
RFC: New/update the Docs Sphinx...

f-strings in the grammar

Pablo Galindo Salgado

20 Sep 2021 20 Sep '21

11:18 a.m.

Hi, I have started a project to move the parsing off-strings to the parser and the grammar. Appart from some maintenance improvements (we can drop a considerable amount of hand-written code), there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them. * The parser will likely have "\n" characters and backslashes in f-strings expressions, which currently is impossible:

...

...
...
f"blah blah {'\n'} blah" File "<stdin>", line 1 f"blah blah {'\n'} blah" ^ SyntaxError: f-string expression part cannot include a backslash

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this: f"some text { my_dict["string1"] } more text" * The parser will naturally allow more control over error messages and AST positions. * The **grammar** of the f-string will be fully specified without ambiguity. Currently, the "grammar" that we have in the docs ( https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-l...) is not really formal grammar because not only is mixing lexing details with grammar details (the definition of " literal_char") but also is not compatible with the current python lexing schema (for instance, it recognizes "{{" as its own token, which the language doesn't allow because something like "{{a:b}:c}" is tokenized as "{", "{", "a" ... not as "{{", "a". Adding a formal grammar could help syntax highlighters, IDEs, parsers and other tools to make sure they properly recognize everything that there is. There may be some other advantages that we have not explored still. The work is at a point where the main idea works (all the grammar is already there and working), but we need to make sure that all existing errors and specifics are properly ported to the new code, which is a considerable amount of work still so I wanted to make sure we are on the same page before we decide to invest more time on this (Batuhan is helping me with this and Lyssandros will likely join us). We are doing this work in this branch: https://github.com/we-like-parsers/cpython/blob/fstring-grammar Tell me what you think. P.S. If you are interested to help with this project, please reach out to me. If we decide to go ahead we can use your help! :) Regards from cloudy London, Pablo Galindo Salgado

Attachments:

attachment.htm (text/html — 6.7 KB)

Show replies by date

Erlend Aasland

20 Sep 20 Sep

12:13 p.m.

On 20 Sep 2021, at 13:18, Pablo Galindo Salgado <pablogsal@gmail.com<mailto:pablogsal@gmail.com>> wrote: We are doing this work in this branch: https://github.com/we-like-parsers/cpython/blob/fstring-grammar That link is broken. Assuming you mean https://github.com/we-like-parsers/cpython/tree/fstring-grammar? E

Serhiy Storchaka

12:46 p.m.

20.09.21 14:18, Pablo Galindo Salgado пише:

...

* The parser will likely have "\n" characters and backslashes in f-strings expressions, which currently is impossible:

What about characters "\x7b", "\x7d", "\x5c", etc? What about newlines in single quotes? Currently this works: f'''{1 + 2}''' But this does not: f'{1 + 2}'

Pablo Galindo Salgado

1:06 p.m.

...

What about characters "\x7b", "\x7d", "\x5c", etc? What about newlines in single quotes? Currently this works:

This is from the current branch:

...

...
...
f"ble { '\x7b' }" 'ble {'

...

...
...
f"{1 + ... 2}" '3'

...

...
...
f'{1 + ... 2}' '3'

On Mon, 20 Sept 2021 at 13:52, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

20.09.21 14:18, Pablo Galindo Salgado пише:

...
* The parser will likely have "\n" characters and backslashes in f-strings expressions, which currently is impossible:

What about characters "\x7b", "\x7d", "\x5c", etc?

What about newlines in single quotes? Currently this works:

f'''{1 + 2}'''

But this does not:

f'{1 + 2}'

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IJEJ5UVV... Code of Conduct: http://python.org/psf/codeofconduct/

Terry Reedy

3:21 p.m.

On 9/20/2021 8:46 AM, Serhiy Storchaka wrote:

...

20.09.21 14:18, Pablo Galindo Salgado пише:

...
* The parser will likely have "\n" characters and backslashes in f-strings expressions, which currently is impossible:

What about characters "\x7b", "\x7d", "\x5c", etc?

What about newlines in single quotes? Currently this works:

f'''{1 + 2}'''

But this does not:

f'{1 + 2}'

The later is an error with or without the 'f' prefix and I think that this should continue to be the case. -- Terry Jan Reedy

Eric V. Smith

3:34 p.m.

On 9/20/2021 11:21 AM, Terry Reedy wrote:

...

On 9/20/2021 8:46 AM, Serhiy Storchaka wrote:

...
20.09.21 14:18, Pablo Galindo Salgado пише:

...
* The parser will likely have "\n" characters and backslashes in f-strings expressions, which currently is impossible:

What about characters "\x7b", "\x7d", "\x5c", etc?

What about newlines in single quotes? Currently this works:

f'''{1 + 2}'''

But this does not:

f'{1 + 2}'

The later is an error with or without the 'f' prefix and I think that this should continue to be the case.

The thought is that anything that's within braces {} and is a valid expression should be allowed. Basically, the opening brace puts you in "parse expression" mode. Personally, I'd be okay with this particular change. Eric

Stephen J. Turnbull

21 Sep 21 Sep

1:36 a.m.

Eric V. Smith writes:

...

...
...
But this does not:

f'{1 + 2}'

The later is an error with or without the 'f' prefix and I think that this should continue to be the case.

The thought is that anything that's within braces {} and is a valid expression should be allowed.

-0 FWIW, some thoughts specific to me, I don't know how representative they might be of others. I guess you could argue that the braces are a kind of expression-level parenthesis, but I don't "see" them that way. I see *one* string with eval'able format expressions embedded in it, so that single-quoted strings can't have embedded newlines. I also don't see the braces as expression-level syntax (after all, they already have two different meanings at expression level), I see them as part of f-string syntax. So even with triple-quoted strings, my eyes "want" to see parentheses or line continuation (which already work). I'm sure I could get used to the syntax. But ... Is this syntax useful? Or is it just a variant of purity trying to escape Pandora's virtualbox? I mean, am I going to see it often enough to get used to it? Or am I going to WTF at it for the rest of my life?

David Mertz, Ph.D.

1:58 a.m.

I know I'm strongly -1 on allowing much more than currently exists for f-strings. For basically the same reason Stephen explains. Newlines inside braces, for example, go way too far away from readability. Nested expressions also feel like an attractive nuisance. I use f-strings all the time, but in much the same way a thousand character regular expression is an abuse (even if perfectly well defined grammatically), really complex f-strings worries look and feel much the same. On Mon, Sep 20, 2021, 9:39 PM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:

...

Eric V. Smith writes:

...
...
...
But this does not:

f'{1 + 2}'

The later is an error with or without the 'f' prefix and I think that this should continue to be the case.

The thought is that anything that's within braces {} and is a valid expression should be allowed.

-0 FWIW, some thoughts specific to me, I don't know how representative they might be of others.

I guess you could argue that the braces are a kind of expression-level parenthesis, but I don't "see" them that way. I see *one* string with eval'able format expressions embedded in it, so that single-quoted strings can't have embedded newlines. I also don't see the braces as expression-level syntax (after all, they already have two different meanings at expression level), I see them as part of f-string syntax. So even with triple-quoted strings, my eyes "want" to see parentheses or line continuation (which already work).

I'm sure I could get used to the syntax. But ...

Is this syntax useful? Or is it just a variant of purity trying to escape Pandora's virtualbox? I mean, am I going to see it often enough to get used to it? Or am I going to WTF at it for the rest of my life? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPFHA55J... Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

2:04 a.m.

[Stephen J. Turnbull]

...

Is this syntax useful? Or is it just a variant of purity trying to escape Pandora's virtualbox? I mean, am I going to see it often enough to get used to it? Or am I going to WTF at it for the rest of my life?

I don't know about the line breaks, but in recent weeks I've found myself more than once having to remind myself that inside interpolations, you must use the other type of quote. Things like print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}") Learning that inside {} you can write any expression is easy (it's a real Aha! moment -- that's the power of f-strings). Remembering that you have to switch up the quote characters there is hard -- it doesn't occur very often, and the reason is obscure. By the time my mental parser has made it to the argument list of removesuffix() it has already forgotten that it's inside an f-string and my fingers just reach for my favorite quote character. And the error isn't really helping either: ```

...

...
...
print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}") File "<stdin>", line 1 print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}") ^ SyntaxError: f-string: unmatched '('


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
&lt;http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Stephen J. Turnbull

5:09 a.m.

Guido van Rossum writes:

...

I don't know about the line breaks, but in recent weeks I've found myself more than once having to remind myself that inside interpolations, you must use the other type of quote.

My earlier remarks were specifically directed to line breaks. I see the point, but I think the question should be readability, as David points out. I don't think there's a problem with the opening quote in your example. Even in an ordinary string literal it's obvious to me that the embedded quotation marks are not intended to terminate the string: s = "Here is a singleton " and here is an initial for "something." But how about that last quotation mark? I tried to construct a similarly visually ambiguous f-string where braces "hide" the embedded quotation marks, and couldn't do it without a trailing quote followed immediately by an embedded literal line break. So I'm cautiously sympathetic to this extension, as long as embedded line breaks are not permitted in singly-quoted f-strings. However, I myself will almost certainly automatically "correct" such quotation marks if they are allowed. So this is unlikely to be a plus or a minus for me. Steve

Terry Reedy

20 Sep 20 Sep

3:19 p.m.

On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:

...

there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them.

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text"

I believe that this will disable regex-based processing, such as syntax highlighters, as in IDLE. I also think that it will be sometimes confusing to human readers. -- Terry Jan Reedy

Eric V. Smith

3:48 p.m.

On 9/20/2021 11:19 AM, Terry Reedy wrote:

...

On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:

...
there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them.

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text"

I believe that this will disable regex-based processing, such as syntax highlighters, as in IDLE. I also think that it will be sometimes confusing to human readers.

When I initially wrote f-strings, it was an explicit design goal to be just like existing strings, but with a new prefix. That's why there are all of the machinations in the parser for scanning within f-strings: the parser had already done its duty, so there needed to be a separate stage to decode inside the f-strings. Since they look just like regular strings, most tools could add the lowest possible level of support just by adding 'f' to existing prefixes they support: 'r', 'b', 'u'. The upside is that if you don't care about what's inside an f-string, your work is done. I definitely share your concern about making f-strings more complicated to parse for tool vendors: basically all editors, alternative implementations, etc.: really anyone who parses python source code. But maybe we've already crossed this bridge with the PEG parser. Although I realize there's a difference between lexing and parsing. While the PEG parser just makes parsing more complicated, this change would make what was lexing into a more sophisticated parsing problem. In 2018 or 2019 at PyCon in Cleveland I talked to several tool vendors. It's been so long ago that I don't remember who, but I'm pretty sure it was PyCharm and 2 or 3 other editors. All of them supported making this change, even understanding the complications it would cause them. I don't recall if I talked to anyone who maintains an alternative implementation, but we should probably discuss it with MicroPython, Cython, PyPy, etc., and understand where they stand on it. In general I'm supportive of this change, because as Pablo points out there are definite benefits. But I think if we do accept it we should understand what sort of burden we're putting on tool and implementation authors. It would probably be a good idea to discuss it at the upcoming dev sprints. Eric

Pablo Galindo Salgado

3:56 p.m.

Thanks a lot, Eric for your message! I actually share some of these worries myself and that's why I wanted to have a bigger conversation. I wanted to also make clear that the change doesn't force us to do *everything*. This means that we can absolutely have some of the improvements but not others (for example allowing backslashes but not nesting). So is important to be clear that is not "all or nothing". We just need to decide what set of things in the design space we want :) On Mon, 20 Sept 2021 at 16:52, Eric V. Smith <eric@trueblade.com> wrote:

...

On 9/20/2021 11:19 AM, Terry Reedy wrote:

...
On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:

...
there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them.

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text"

I believe that this will disable regex-based processing, such as syntax highlighters, as in IDLE. I also think that it will be sometimes confusing to human readers.

When I initially wrote f-strings, it was an explicit design goal to be just like existing strings, but with a new prefix. That's why there are all of the machinations in the parser for scanning within f-strings: the parser had already done its duty, so there needed to be a separate stage to decode inside the f-strings. Since they look just like regular strings, most tools could add the lowest possible level of support just by adding 'f' to existing prefixes they support: 'r', 'b', 'u'. The upside is that if you don't care about what's inside an f-string, your work is done.

I definitely share your concern about making f-strings more complicated to parse for tool vendors: basically all editors, alternative implementations, etc.: really anyone who parses python source code. But maybe we've already crossed this bridge with the PEG parser. Although I realize there's a difference between lexing and parsing. While the PEG parser just makes parsing more complicated, this change would make what was lexing into a more sophisticated parsing problem.

In 2018 or 2019 at PyCon in Cleveland I talked to several tool vendors. It's been so long ago that I don't remember who, but I'm pretty sure it was PyCharm and 2 or 3 other editors. All of them supported making this change, even understanding the complications it would cause them. I don't recall if I talked to anyone who maintains an alternative implementation, but we should probably discuss it with MicroPython, Cython, PyPy, etc., and understand where they stand on it.

In general I'm supportive of this change, because as Pablo points out there are definite benefits. But I think if we do accept it we should understand what sort of burden we're putting on tool and implementation authors. It would probably be a good idea to discuss it at the upcoming dev sprints.

Eric

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DN5HB7CB... Code of Conduct: http://python.org/psf/codeofconduct/

Terry Reedy

5:53 p.m.

On 9/20/2021 11:48 AM, Eric V. Smith wrote:

...

When I initially wrote f-strings, it was an explicit design goal to be just like existing strings, but with a new prefix. That's why there are all of the machinations in the parser for scanning within f-strings: the parser had already done its duty, so there needed to be a separate stage to decode inside the f-strings. Since they look just like regular strings, most tools could add the lowest possible level of support just by adding 'f' to existing prefixes they support: 'r', 'b', 'u'.

Which is what I did with IDLE. Of course 'just add' was complicated by uppercase being allowed and 'f' being compatible with 'r' but not 'u' or 'b'.

...

I definitely share your concern about making f-strings more complicated to parse for tool vendors: basically all editors, alternative implementations, etc.: really anyone who parses python source code. But maybe we've already crossed this bridge with the PEG parser.

I think we are on the far side of the bridge with contextual keywords. I don't believe the new code for highlighting the new match statement is exactly correct. As I remember, properly classifying '_' in all the examples we created was too difficult, and maybe not possible.

...

Although I realize there's a difference between lexing and parsing. While the PEG parser just makes parsing more complicated, this change would make what was lexing into a more sophisticated parsing problem.

I have no love for the RE code. I would try ast.parse if I was not sure it would be too slow. I would be happy if a simplified and fast minimal lexer/parser were added for everyone to use. It would not have to make exactly the same distinctions that IDLE currently does. -- Terry Jan Reedy

Guido van Rossum

3:53 p.m.

The current restrictions will also confuse some users (e.g. those used to bash, and IIRC JS, where the rules are similar as what Pablo is proposing). On Mon, Sep 20, 2021 at 8:24 AM Terry Reedy <tjreedy@udel.edu> wrote:

...

On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:

...
there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them.

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text"

I believe that this will disable regex-based processing, such as syntax highlighters, as in IDLE. I also think that it will be sometimes confusing to human readers.

-- Terry Jan Reedy

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TWSJKE4K... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Patrick Reader

4:03 p.m.

...

The current restrictions will also confuse some users (e.g. those used to bash, and IIRC JS, where the rules are similar as what Pablo is proposing). -- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)

WRT the similar syntax in bash (and similar shells), there are two options: "string `code` string" "string $(code) string" The latter, $(), allows fully-featured nesting in the way Pablo is suggesting: "string $(code "string2 $(code2) string2" code) string" The former, using backticks, does not allow nesting directly, but it allows extra backslashes inside the backticks to escape the nested ones, like this: "string `code "string2 \`code2\` string2" code` string" This can be nested infinitely using lots of backslashes. Is this worth considering as another option? It doesn't have the disadvantage of complicating lexing (as much), although nesting with backslashes is quite ugly. IMO nesting things in f-strings would be ugly anyway, so I don't think that would matter too much.

Guido van Rossum

8:25 p.m.

On Mon, Sep 20, 2021 at 1:07 PM Patrick Reader <_@pxeger.com> wrote:

...

...
The current restrictions will also confuse some users (e.g. those used to bash, and IIRC JS, where the rules are similar as what Pablo is proposing). -- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)

WRT the similar syntax in bash (and similar shells), there are two options:

"string `code` string"

"string $(code) string"

The latter, $(), allows fully-featured nesting in the way Pablo is suggesting:

"string $(code "string2 $(code2) string2" code) string"

The former, using backticks, does not allow nesting directly, but it allows extra backslashes inside the backticks to escape the nested ones, like this:

"string `code "string2 \`code2\` string2" code` string"

This can be nested infinitely using lots of backslashes. Is this worth considering as another option? It doesn't have the disadvantage of complicating lexing (as much), although nesting with backslashes is quite ugly. IMO nesting things in f-strings would be ugly anyway, so I don't think that would matter too much.

F-strings are more like $(...), since the interpolation syntax uses {...} delimiters. So it probably should work that way. JS interpolation works that way too, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_l... . I wouldn't want to do anything to bring `backticks` back in the language. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Thomas Grainger

3:54 p.m.

I don't think the python syntax should be beholden to syntax highlighting tools, eventually some syntax feature that PEG enables will require every parser or highlighter to switch to a similar or more powerful parse tool

Brett Cannon

9:36 p.m.

On Mon, Sep 20, 2021 at 8:58 AM Thomas Grainger <tagrain@gmail.com> wrote:

...

I don't think the python syntax should be beholden to syntax highlighting tools, eventually some syntax feature that PEG enables will require every parser or highlighter to switch to a similar or more powerful parse tool

But that's not how syntax highlighting works in editors. You typically don't get to choose the parsing tool used for syntax highlighting, you just define the grammar using whatever is provided by the editor (which has always been regexes based on my experience). So there's no way to "require" every editor out there to switch to a PEG parser or equivalent to support Python's grammar because that's asking every editor to change how syntax highlighting is implemented at a lower level. Having said all that, I think as long as we understand that this is a side-effect then it's fine; syntax highlighting is usually not tied to semantics in an editor so it shouldn't be a blocker on this. If people care they simply won't use the same type of quotes in their code (which I bet is what most people will do unless Black says otherwise 😉). But I also think this means we definitely have to get a parser module for tools together as this is way more potential breakage than just parentheses for `with` statements and I don't know if formatting tools can just move to the AST module at that point. 😅

Pablo Galindo Salgado

9:50 p.m.

...

...
But I also think this means we definitely have to get a parser module

What is in this context a "parse" module? Because that will massively change depending who you ask. We already expose APIs that return AST objects that can be used for all sort of things and a tokenizer module that exposes some form of lexing that is relatively close to the one that CPython uses internally. The only missing piece would be something that returns a CST with enough information to reconstruct the source but at this point that is absolutely arbitrary because nothing in CPython would use that tree. Not only that, but the requirements from such CST will change quite a lot depending on who you ask and that impacts a lot the APIs that we would need to offer. Offering a parse module here can involve quite a high maintainance cost without the certainly that will be useful to all set of users. That also without considering that many tools parsing Python code are not written on Python and will not be able to leverage it. On Mon, 20 Sep 2021, 22:39 Brett Cannon, <brett@python.org> wrote:

...

On Mon, Sep 20, 2021 at 8:58 AM Thomas Grainger <tagrain@gmail.com> wrote:

...
I don't think the python syntax should be beholden to syntax highlighting tools, eventually some syntax feature that PEG enables will require every parser or highlighter to switch to a similar or more powerful parse tool

But that's not how syntax highlighting works in editors. You typically don't get to choose the parsing tool used for syntax highlighting, you just define the grammar using whatever is provided by the editor (which has always been regexes based on my experience). So there's no way to "require" every editor out there to switch to a PEG parser or equivalent to support Python's grammar because that's asking every editor to change how syntax highlighting is implemented at a lower level.

Having said all that, I think as long as we understand that this is a side-effect then it's fine; syntax highlighting is usually not tied to semantics in an editor so it shouldn't be a blocker on this. If people care they simply won't use the same type of quotes in their code (which I bet is what most people will do unless Black says otherwise 😉).

But I also think this means we definitely have to get a parser module for tools together as this is way more potential breakage than just parentheses for `with` statements and I don't know if formatting tools can just move to the AST module at that point. 😅 _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/I4POAK22... Code of Conduct: http://python.org/psf/codeofconduct/

Jeremiah Paige

4:26 p.m.

I just want to say that I am very excited to see where this goes. As an author of a package that tries to recreate compiled f-strings at runtime, they are a hard thing to generate given the current tools within Python. On Mon, Sep 20, 2021 at 4:23 AM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

...

Tell me what you think.

P.S. If you are interested to help with this project, please reach out to me. If we decide to go ahead we can use your help! :)

I don't know the CPython API very well, but if there is anything I can do to help, I would be happy to assist. Regards, Jeremiah

Eric V. Smith

21 Sep 21 Sep

6:42 p.m.

To bring this back on track, I'll try and answer the questions from your original email. On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:

...

I have started a project to move the parsing off-strings to the parser and the grammar. Appart from some maintenance improvements (we can drop a considerable amount of hand-written code), there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them.

I think this is all awesome. My position is that if we make zero syntactic changes to f-strings, and leave the functionality exactly as it is today, I think we should still move the logic into the parser and grammar, as you suggested. As you say, this would eliminate a lot of code, and in addition likely get us better error messages. As for the things we could possibly add:

...

* The parser will likely have "\n" characters and backslashes in f-strings expressions, which currently is impossible:

...
...
...
f"blah blah {'\n'} blah" File "<stdin>", line 1 f"blah blah {'\n'} blah" ^ SyntaxError: f-string expression part cannot include a backslash

...

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text" I'm okay with this, with the caveat that I raised in another email: the effect on non-Python tools and alternate Python implementations. To restate that here: as long as we survey some (most?) of the affected

I think supporting backslashes in strings inside of f-string expression (the part inside {}) would be a big win, and should be the first thing we allow. I often have to do this: nl = '\n' x = f"blah {nl if condition else ' '}" Being able to write this more naturally would be a big win. I don't recall exactly why, but I disallowed backslashes inside expressions at the last minute before 3.6 was released. It might have been because I was interpreting them in a way that didn't make sense if a "real" parser were inspecting f-strings. The idea, even back then, was to re-allow them when/if we moved f-string parsing into the parser itself. I think it's time. parties and they're okay with it (or at least it doesn't cause them a gigantic amount of work), then I'm okay with it. This will of course be subjective. My big concern is tools that today use regex's (or similar) to recognize f-strings, and then completely ignore what's inside them. They just want to "skip over" f-strings in the source code, maybe because they're doing some sort of source-to-source transpiling, and they're just going to output the f-strings as-is. It seems to me we're creating a lot of work for such tools. Are there a lot of such tools? I don't know: maybe there are none.

...

* The parser will naturally allow more control over error messages and AST positions. This would be a good win. * The **grammar** of the f-string will be fully specified without ambiguity. Currently, the "grammar" that we have in the docs (https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-l... <https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals>) is not really formal grammar because not only is mixing lexing details with grammar details (the definition of "literal_char") but also is not compatible with the current python lexing schema (for instance, it recognizes "{{" as its own token, which the language doesn't allow because something like "{{a:b}:c}" is tokenized as "{", "{", "a" ... not as "{{", "a". Adding a formal grammar could help syntax highlighters, IDEs, parsers and other tools to make sure they properly recognize everything that there is. Also a big win. There may be some other advantages that we have not explored still.

The work is at a point where the main idea works (all the grammar is already there and working), but we need to make sure that all existing errors and specifics are properly ported to the new code, which is a considerable amount of work still so I wanted to make sure we are on the same page before we decide to invest more time on this (Batuhan is helping me with this and Lyssandros will likely join us). We are doing this work in this branch: https://github.com/we-like-parsers/cpython/blob/fstring-grammar <https://github.com/we-like-parsers/cpython/blob/fstring-grammar>

Tell me what you think.

P.S. If you are interested to help with this project, please reach out to me. If we decide to go ahead we can use your help! :)

I'm interested in helping. Thanks for your work on this. Eric

Guido van Rossum

7:29 p.m.

On Tue, Sep 21, 2021 at 11:49 AM Eric V. Smith <eric@trueblade.com> wrote:

...

[Pablo]

* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text"

I'm okay with this, with the caveat that I raised in another email: the effect on non-Python tools and alternate Python implementations. To restate that here: as long as we survey some (most?) of the affected parties and they're okay with it (or at least it doesn't cause them a gigantic amount of work), then I'm okay with it. This will of course be subjective. My big concern is tools that today use regex's (or similar) to recognize f-strings, and then completely ignore what's inside them. They just want to "skip over" f-strings in the source code, maybe because they're doing some sort of source-to-source transpiling, and they're just going to output the f-strings as-is. It seems to me we're creating a lot of work for such tools. Are there a lot of such tools? I don't know: maybe there are none.

I assume this is primarily an issue for syntax highlighters, which must work under adverse conditions: the code may contain syntax errors nearby and they must update fast when the user is typing. (I recall these were the challenges when I implemented the first syntax coloring for IDLE several decades ago.) If the syntax highlighter shows the wrong colors in an edge case, users can usually live with that. Something that just colors the entire f-string, including the interpolations, with the "string" color is not optimal anyways; the editor I currently use, VS Code, knows about f-strings and colorizes (and otherwise analyzes) the interpolations as expressions. I imagine if you have a simple-minded highlighter that just uses a regex that matches string quotes, it will take something like my example and color it "string" until the first nested quote, then be confused for a bit, and then start coloring "string" after the second nested quote, until the end of the f-string. So the confusion is local. I created a gist with my example. This uses some well-known colorizer written in JavaScript (I presume). It seems to actually already support nested quotes?! https://gist.github.com/gvanrossum/b8ca09175a0d1399a8999f13c7bfa616 And here's a copy-paste from VS Code (it also shows a red underline under the entire f-string, but the copy doesn't show it): def generate(source): print("# What comes before") print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}") print("\t$(COMMAND)") So these two tools, at least, seem to be doing all right (maybe because they both come from the JavaScript culture, where nested interpolations are well-known). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Terry Reedy

10:03 p.m.

On 9/21/2021 3:29 PM, Guido van Rossum wrote:

...

On Tue, Sep 21, 2021 at 11:49 AM Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

[Pablo]

...
* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text"

I'm okay with this, with the caveat that I raised in another email: the effect on non-Python tools and alternate Python implementations. To restate that here: as long as we survey some (most?) of the affected parties and they're okay with it (or at least it doesn't cause them a gigantic amount of work), then I'm okay with it. This will of course be subjective. My big concern is tools that today use regex's (or similar) to recognize f-strings, and then completely ignore what's inside them. They just want to "skip over" f-strings in the source code, maybe because they're doing some sort of source-to-source transpiling, and they're just going to output the f-strings as-is. It seems to me we're creating a lot of work for such tools. Are there a lot of such tools? I don't know: maybe there are none.

I assume this is primarily an issue for syntax highlighters, which must work under adverse conditions: the code may contain syntax errors nearby and they must update fast when the user is typing. (I recall these were the challenges when I implemented the first syntax coloring for IDLE several decades ago.)

If same-quote nesting were limited to 1 deep, REs could handle it. Since nesting is not, and same-quote nesting would not be, they cannot in general. f'''length is {len(3*f"{f'{a}'}")}''' 'length is 3' Still, if this arrives, I would consider a patch to handle the first nesting level.

...

If the syntax highlighter shows the wrong colors in an edge case, users can usually live with that.

Since IDLE is a gift, not a product, I've decided a feature falling short of perfection is OK.

...

Something that just colors the entire f-string, including the interpolations, with the "string" color is not optimal anyways;

To me, there is a tradeoff. Thunderbird bird displays the gmane version of the example below highlighted. I find the broken chunks to jarring.

...

the editor I currently use, VS Code, knows about f-strings and colorizes (and otherwise analyzes) the interpolations as expressions.

The red underline on the original display is a nice touch. It would definitely help to tie the whole string together. Assuming VS Code handles the double nesting, does it give two underlines for the example above, for the outer and middle strings?

...

I imagine if you have a simple-minded highlighter that just uses a regex that matches string quotes, it will take something like my example and color it "string" until the first nested quote, then be confused for a bit, and then start coloring "string" after the second nested quote, until the end of the f-string. So the confusion is local.

This is what IDLE does now.

...

I created a gist with my example. This uses some well-known colorizer written in JavaScript (I presume). It seems to actually already support nested quotes?! https://gist.github.com/gvanrossum/b8ca09175a0d1399a8999f13c7bfa616 <https://gist.github.com/gvanrossum/b8ca09175a0d1399a8999f13c7bfa616>

And here's a copy-paste from VS Code (it also shows a red underline under the entire f-string, but the copy doesn't show it):

def generate(source): print("# What comes before") print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}") print("\t$(COMMAND)")

So these two tools, at least, seem to be doing all right (maybe because they both come from the JavaScript culture, where nested interpolations are well-known).

With only 1 or even 2 types of quotes, reusing them would be more necessary than it is in Python. -- Terry Jan Reedy

Barry Warsaw

10:44 p.m.

On Sep 21, 2021, at 15:03, Terry Reedy <tjreedy@udel.edu> wrote:

...

If same-quote nesting were limited to 1 deep, REs could handle it. Since nesting is not, and same-quote nesting would not be, they cannot in general.

f'''length is {len(3*f"{f'{a}'}")}'''

I tried this in the latest python-mode.el for Emacs, and while it isn’t able to handle it correctly, at least the damage is local. That’s been one of the problems with Emacs syntax highlighting: it usually works quite well these days but when it gets messed up, the incorrect highlighting can extend deep into the file. speaking-for-all-3-of-the-remaining-emacs-users-ly y’rs, -Barry

Steve Dower

10:58 p.m.

On 9/21/2021 7:42 PM, Eric V. Smith wrote:

...

I don't recall exactly why, but I disallowed backslashes inside expressions at the last minute before 3.6 was released. It might have been because I was interpreting them in a way that didn't make sense if a "real" parser were inspecting f-strings. The idea, even back then, was to re-allow them when/if we moved f-string parsing into the parser itself. I think it's time.

Yeah, we were still trying to figure out whether escapes like "\\n" would be evaluated as "\\n" or "\n" in the expression, and decided to decide later. If we can clearly articulate which it is now, then let's go ahead and enable it.

...

...
* The parser will allow nesting quote characters. This means that we **could** allow reusing the same quote type in nested expressions like this:

f"some text { my_dict["string1"] } more text" I'm okay with this, with the caveat that I raised in another email: the effect on non-Python tools and alternate Python implementations.

As a fairly regular user, I would be very happy to not have to worry about mixing quotes. It's also not going to break any existing code, so safe enough to enable if we can. Agreed with Eric on the rest. Cheers, Steve

Guido van Rossum

11:15 p.m.

On Tue, Sep 21, 2021 at 4:08 PM Steve Dower <steve.dower@python.org> wrote:

...

On 9/21/2021 7:42 PM, Eric V. Smith wrote:

...
I don't recall exactly why, but I disallowed backslashes inside expressions at the last minute before 3.6 was released. It might have been because I was interpreting them in a way that didn't make sense if a "real" parser were inspecting f-strings. The idea, even back then, was to re-allow them when/if we moved f-string parsing into the parser itself. I think it's time.

Yeah, we were still trying to figure out whether escapes like "\\n" would be evaluated as "\\n" or "\n" in the expression, and decided to decide later. If we can clearly articulate which it is now, then let's go ahead and enable it.

That would seem easy enough, right? f"spam {'xyz'.replace('y', '\\n')} spam" should be equal to "spam x\\ny spam" and print as spam x\ny spam (i.e. a literal backslash followed by 'n', not a newline). You shouldn't have to double the \ in the interpolated expression just because it's in an f-string. I presume it was trickier at the time because we were coming from "{xxx}".format(...), where the parser doesn't know that the string is a format string. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Eric V. Smith

11:37 p.m.

On 9/21/2021 7:15 PM, Guido van Rossum wrote:

...

On Tue, Sep 21, 2021 at 4:08 PM Steve Dower <steve.dower@python.org <mailto:steve.dower@python.org>> wrote:

On 9/21/2021 7:42 PM, Eric V. Smith wrote: > I don't recall exactly why, but I disallowed backslashes inside > expressions at the last minute before 3.6 was released. It might have > been because I was interpreting them in a way that didn't make sense if > a "real" parser were inspecting f-strings. The idea, even back then, was > to re-allow them when/if we moved f-string parsing into the parser > itself. I think it's time.

Yeah, we were still trying to figure out whether escapes like "\\n" would be evaluated as "\\n" or "\n" in the expression, and decided to decide later. If we can clearly articulate which it is now, then let's go ahead and enable it.

That would seem easy enough, right?

f"spam {'xyz'.replace('y', '\\n')} spam"

should be equal to

"spam x\\ny spam"

and print as

spam x\ny spam

(i.e. a literal backslash followed by 'n', not a newline).

Yes, I think that's the desired behavior. Before I removed this in 3.6 (during the betas, IIRC), it would have produced an actual newline, because of where the f-string 'parser' was able to insert itself into the process. I/we didn't like that behavior, and it was too late to change it. We could add this now in the bespoke f-string parser, although I don't know off the top of my head how much work it would be. But if we're going to switch to Pablo's parser then I don't think there's any point.

...

You shouldn't have to double the \ in the interpolated expression just because it's in an f-string. Right. I presume it was trickier at the time because we were coming from "{xxx}".format(...), where the parser doesn't know that the string is a format string.

Yes, that was part of it. Eric

Nick Coghlan

24 Sep 24 Sep

3:17 a.m.

On Mon, 20 Sep 2021, 9:19 pm Pablo Galindo Salgado, <pablogsal@gmail.com> wrote:

...

Hi,

I have started a project to move the parsing off-strings to the parser and the grammar. Appart from some maintenance improvements (we can drop a considerable amount of hand-written code), there are some interesting things we **could** (emphasis on could) get out of this and I wanted to discuss what people think about them.

The change seems like a good idea, but the consequences should be summarised in a PEP (either the existing https://www.python.org/dev/peps/pep-0536/ or a replacement for it). Cheers, Nick.

...

1155

Age (days ago)

1159

Last active (days ago)

List overview

Download

28 comments

15 participants

participants (15)

Barry Warsaw
Brett Cannon
David Mertz, Ph.D.
Eric V. Smith
Erlend Aasland
Guido van Rossum
Jeremiah Paige
Nick Coghlan
Pablo Galindo Salgado
Patrick Reader
Serhiy Storchaka
Stephen J. Turnbull
Steve Dower
Terry Reedy
Thomas Grainger

f-strings in the grammar

Patrick Reader

tags

participants (15)