On Wed, Nov 3, 2021 at 5:12 PM Stephen J. Turnbull
Chris Angelico writes:
Huh. Is that level of generality actually still needed? Can Python deprecate all but a small handful of encodings?
I think that's pointless. With few exceptions (GB18030, Big5 has a couple of code point pairs that encode the same very rare characters, ISO 2022 extensions) you're not going to run into the confuseables problem, and AFAIK the only generic BIDI solution is Unicode (the ISO 8859 encodings of Hebrew and Arabic do not have direction markers).
What exactly are you thinking?
You'll never eliminate confusables (even ASCII has some, depending on font). But I was surprised to find that Python would let you use unicode_escape for source code. # coding: unicode_escape x = ''' Code example: \u0027\u0027\u0027 # format in monospaced on the web site print("Did you think this would be executed?") \u0027\u0027\u0027 # end monospaced Surprise! ''' print("There are %d lines in x." % len(x.split(chr(10)))) With some carefully-crafted comments, a lot of human readers will ignore the magic tokens. It's not uncommon to put example code into triple-quoted strings, and it's also not all that surprising when simplified examples do things that you wouldn't normally want done (like monkeypatching other modules), since it's just an example, after all. I don't have access to very many editors, but SciTE, VS Code, nano, and the GitHub gist display all syntax-highlighted this as if it were a single large string. Only Idle showed it as code in between, and that's because it actually decoded it using the declared character coding, so the magic lines showed up with actual apostrophes. Maybe the phrase "a small handful" was a bit too hopeful, but would it be possible to mandate (after, obviously, a deprecation period) that source encodings be ASCII-compatible? ChrisA