Raw strings ending with a backslash

Now that we have a new parser for CPython, can we fix the old gotcha that raw strings cannot end in a backslash? Its an FAQ and has come up again on the bug tracker. https://docs.python.org/3/faq/design.html#id26 https://github.com/python/cpython/issues/93314 -- Steve

28.05.22 12:22, Steven D'Aprano пише:
I do not think that we can allow this, and it is not related to parser. Few years ago I experimented with such change: https://github.com/python/cpython/pull/15217 You can see that it breaks even some stdlib code, and it will definitely break many third-party packages and examples. Technically we can do this, but the benefit is small in comparison with the cost.

That PR seems to make \' and \" not special in general right? I think this is a more limited proposal, to only change the behavior when \ is at the end of a string, so the only behavior difference would never receiving the error "SyntaxError: EOL while scanning string literal" In which case there should be no backwards compatibility issue. Damian On Sat, May 28, 2022 at 12:20 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

On Sat, May 28, 2022 at 12:11 MRAB Names in Python are case-sensitive, yet the string prefixes are
case-/insensitive/.
Why?
IIRC we copied this from C for numeric suffixes (0l and 0L are the same; also hex digits and presumably 0XA == 0xa) and then copied that for string prefixes without thinking about it much. I guess it’s too late to change. —Guido -- --Guido (mobile)

On Sat, May 28, 2022 at 12:55 PM Guido van Rossum <guido@python.org> wrote:
Given that 99.99% of code uses lower case string prefixes we *could* change it, it'd just take a longer deprecation cycle - you'd probably want a few releases where the upper case prefixes become an error in files without a `from __future__ import case_sensitive_quote_prefixes` rather than jumping straight from parse time DeprecationWarning to repurposing the uppercase to have a new meaning. The inertia behind doing that over the course of 5+ years is high. Implying that we'd need a compelling reason to orchestrate it. None has sprung up. -gps

On 5/28/2022 7:57 AM, Damian Shaw wrote:
How would you know where the end of a string is? I think this is one of those things that's easy to look at for a human and figure out the intent, but not so easy for the lexer, without some heuristics and backtracking. If the trailing single quote is removed below, it changes from "backslash in the middle of a string" to "backslash at the end of a string, followed by an arbitrary expression. r'\' + "foo"' Eric

Thank you to everyone who responded, it is now clear to me that this genuinely is a feature, not a bug or limitation of the parser or lexer. And that there is code relying on that behaviour, including in the stdlib, so we shouldn't change it even if we could. -- Steve

28.05.22 12:22, Steven D'Aprano пише:
I do not think that we can allow this, and it is not related to parser. Few years ago I experimented with such change: https://github.com/python/cpython/pull/15217 You can see that it breaks even some stdlib code, and it will definitely break many third-party packages and examples. Technically we can do this, but the benefit is small in comparison with the cost.

That PR seems to make \' and \" not special in general right? I think this is a more limited proposal, to only change the behavior when \ is at the end of a string, so the only behavior difference would never receiving the error "SyntaxError: EOL while scanning string literal" In which case there should be no backwards compatibility issue. Damian On Sat, May 28, 2022 at 12:20 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

On Sat, May 28, 2022 at 12:11 MRAB Names in Python are case-sensitive, yet the string prefixes are
case-/insensitive/.
Why?
IIRC we copied this from C for numeric suffixes (0l and 0L are the same; also hex digits and presumably 0XA == 0xa) and then copied that for string prefixes without thinking about it much. I guess it’s too late to change. —Guido -- --Guido (mobile)

On Sat, May 28, 2022 at 12:55 PM Guido van Rossum <guido@python.org> wrote:
Given that 99.99% of code uses lower case string prefixes we *could* change it, it'd just take a longer deprecation cycle - you'd probably want a few releases where the upper case prefixes become an error in files without a `from __future__ import case_sensitive_quote_prefixes` rather than jumping straight from parse time DeprecationWarning to repurposing the uppercase to have a new meaning. The inertia behind doing that over the course of 5+ years is high. Implying that we'd need a compelling reason to orchestrate it. None has sprung up. -gps

On 5/28/2022 7:57 AM, Damian Shaw wrote:
How would you know where the end of a string is? I think this is one of those things that's easy to look at for a human and figure out the intent, but not so easy for the lexer, without some heuristics and backtracking. If the trailing single quote is removed below, it changes from "backslash in the middle of a string" to "backslash at the end of a string, followed by an arbitrary expression. r'\' + "foo"' Eric

Thank you to everyone who responded, it is now clear to me that this genuinely is a feature, not a bug or limitation of the parser or lexer. And that there is code relying on that behaviour, including in the stdlib, so we shouldn't change it even if we could. -- Steve
participants (9)
-
Barney Gale
-
Chris Angelico
-
Damian Shaw
-
Eric V. Smith
-
Gregory P. Smith
-
Guido van Rossum
-
MRAB
-
Serhiy Storchaka
-
Steven D'Aprano