On Wed, Nov 3, 2021 at 8:01 PM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Chris Angelico writes:
But I was surprised to find that Python would let you use unicode_escape for source code.
I'm not surprised. Today it's probably not necessary, but I've exchanged a lot of code (not Python, though) with folks whose editors were limited to 8 bit codes or even just ASCII. It wasn't frequent that I needed to discuss non-ASCII code with them (that they needed to run) but it would have been painful to do without some form of codec that encoded Japanese using only ASCII bytes.
Bearing in mind that string literals can always have their own escapes, this feature is really only important to the source code tokens themselves.
Maybe the phrase "a small handful" was a bit too hopeful, but would it be possible to mandate (after, obviously, a deprecation period) that source encodings be ASCII-compatible?
Not sure what you mean there. In the usual sense of ASCII-compatible (the ASCII bytes always mean the corresponding character in the ASCII encoding), I think there are at least two ASCII-incompatible encodings that would cause a lot of pain if they were prohibited, specifically Shift JIS and Big5. (In certain contexts in those encodings an ASCII byte frequently is a trailing byte in a multibyte character.)
Ah, okay, so much for that, then. What about the weaker sense: Characters below 128 are always and only represented by those byte values? So if you find byte value 39, it might not actually be an apostrophe, but if you're looking for an apostrophe, you know for sure that it'll be represented by byte value 39?
It might make sense to prohibit unicode_escape nowadays -- I think almost all systems now can handle Unicode properly, but I don't think we can go farther than that.
Yes. I'm sure someone will come along and say "but I have to have an all-ASCII source file, directly runnable, with non-ASCII variable names", because XKCD 1172, but I don't have enough sympathy for that obscure situation to want the mess that unicode_escape can give. ChrisA