Laura Creighton writes:
Steve Turnbull, who lives in Japan, and speaks and writes Japanese is saying that "he cannot see any reason for allowing non-ASCII compatible encodings in Cpython".
This makes me wonder.
Is this along the lines of 'even in Japan we do not want such things' or along the lines of 'when in Japan we want such things we want to so brutally do so much more, so keep the reference implementation simple, and don't try to help us with this seems-like-a-good-idea-but-isnt-in-practice' ideas like this one, or ....
I'm saying that to my knowledge Japan is the most complicated place there is when it comes to encodings, and even so, nobody here seems to be using UTF-16 as the encoding for program sources (or any other text/* media). Of course as Steve Dower pointed out it's in heavy use as an internal text encoding, in OS APIs, in some languages' stdlib APIs (ie, Java and I suppose .NET), and I guess in single-application file formats (Word), but the programs that use those APIs are written in ASCII compatible-encodings (and Shift JIS and Big5). The Japanese don't need or want UTF-16 in text files, etc. Besides that, I can also say that PEP 263 didn't legislate the use of ASCII-compatible encodings. For one thing, Shift JIS and Big5 aren't 100% compatible because they uses 0x20-0x7f in multibyte characters. They're just close enough to ASCII compatible to mostly "just work", at least on Microsoft OSes provided by OEMs in the relevant countries. What PEP 263 did do was to specify that non-ASCII-compatible encodings are not supported by the PEP 263 mechanism for declaring the encoding of a Python source program. That's because it looks for a "magic number" which is the ASCII-encoded form of "coding:" in the first two lines. It doesn't rule out alternative mechanisms for encoding detection (specifically, use of the UTF-16 "BOM" signature); it just doesn't propose implementing them. IIRC nobody has ever asked for them, but I think the idea is absurd so I have to admit I may have seen a request and forgot it instantly. Bottom line: as long as Python (or the launcher) is able to transcode the source to the internal Unicode format (UTF-8 in Python 2, and widechar or PEP 393 in Python 3) before actually beginning parsing, any on-disk encoding is OK. But I just don't see a use case for UTF-16. If I'm wrong, I think that this feature should be added to launchers, not CPython, because it forces the decoder to know what formats other than ASCII are implemented and to try heuristics to guess, rather than just obeying the coding cookie.