
On Fri, 17 Feb 2023 at 06:11, Arusekk <arek_koz@o2.pl> wrote:
W dniu 16.02.2023 o 17:55, David Mertz, Ph.D. pisze:
Wow! That would break SO MUCH of the code I've written! E.g.:
translate = {"el": "ἐπιστήμη", "en": "Knowledge", "zh": "知识"}
You did not use any codepoint in the U+0080-U+00FF range here. Are you sure the primary suggestion would break such code?
I only meant deprecate "\xNN" in favor of "\u00NN" in the original idea, because it is too confusing against b"\xNN".
Bytes literals are allowed to contain ASCII characters because bytestrings often do contain textual portions. This could have been changed in Python 3.0, but it wasn't, because it is *useful* to have text and byte strings work similarly. The confusion you're describing is just as strong as: b"Length: %d" % count # versus "Length: %d" % count which was specifically *added* to byte strings because, again, it is incredibly useful. What would actually be gained by breaking text strings in this way, other than a warm fuzzy feeling that even the first 256 codepoints are still represented by four-digit numbers? It's not guaranteeing uniformity of Unicode escapes (since the first 65536 codepoints still get a shorthand that can't be used for the others), it's not actually distinguishing them from byte strings (they have a lot of the same methods and behaviours), and you're breaking a huge amount of perfectly reasonable code. Breaking backward compatibility is a **big deal**. It needs a lot more justification than you've provided. ChrisA