[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

24 Jan 2021

      Chris Angelico writes:
...
Can anyone give an example of a current in-use system encoding that
would have [ASCII bytes in non-ASCII text]?
Shift JIS, Big5.  (Both can have bytes < 128 inside multibyte
characters.)  I don't know if Big5 is still in use as the default
encoding anywhere, but Shift JIS is, although it's decreasing.

For both of those once you encounter a non-ASCII byte you can just
switch over, and none of the previous text was mis-decoded.  But
that's only if you *know* the language was Japanese (respectively
Chinese).  Remember, there is no encoding that can be distinguished
from ISO 8859-1 (and several other Latin encodings) simply based on
the bytes found, since it uses all 256 bytes.
...
How likely is it that you'd get even one line of text that purports
to be ASCII?
Program source code where the higher-level functions (likely to
contain literal strings) come late in the file are frequently
misdetected based on the earlier bytes.

Steve

[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

Stephen J. Turnbull