[Python-ideas] Re: Deprecate misleading escapes in strings

Feb. 16, 2023

      On Fri, 17 Feb 2023 at 06:11, Arusekk <arek_koz@o2.pl> wrote:
...
W dniu 16.02.2023 o 17:55, David Mertz, Ph.D. pisze:
...
Wow! That would break SO MUCH of the code I've written!  E.g.:
translate = {"el": "ἐπιστήμη", "en": "Knowledge", "zh": "知识"}
You did not use any codepoint in the U+0080-U+00FF range here.
Are you sure the primary suggestion would break such code?
I only meant deprecate "\xNN" in favor of "\u00NN" in the original idea,
because it is too confusing against b"\xNN".
Bytes literals are allowed to contain ASCII characters because
bytestrings often do contain textual portions. This could have been
changed in Python 3.0, but it wasn't, because it is *useful* to have
text and byte strings work similarly. The confusion you're describing
is just as strong as:

b"Length: %d" % count
# versus
"Length: %d" % count

which was specifically *added* to byte strings because, again, it is
incredibly useful.

What would actually be gained by breaking text strings in this way,
other than a warm fuzzy feeling that even the first 256 codepoints are
still represented by four-digit numbers? It's not guaranteeing
uniformity of Unicode escapes (since the first 65536 codepoints still
get a shorthand that can't be used for the others), it's not actually
distinguishing them from byte strings (they have a lot of the same
methods and behaviours), and you're breaking a huge amount of
perfectly reasonable code.

Breaking backward compatibility is a **big deal**. It needs a lot more
justification than you've provided.

ChrisA

[Python-ideas] Re: Deprecate misleading escapes in strings

Chris Angelico