On Sun, Jan 24, 2021 at 10:00:47PM +1100, Chris Angelico wrote:
On Sun, Jan 24, 2021 at 9:13 PM Stephen J. Turnbull
wrote: Chris Angelico writes:
Can anyone give an example of a current in-use system encoding that would have [ASCII bytes in non-ASCII text]?
Shift JIS, Big5. (Both can have bytes < 128 inside multibyte characters.) I don't know if Big5 is still in use as the default encoding anywhere, but Shift JIS is, although it's decreasing.
Sorry, let me clarify.
Can anyone give an example of a current system encoding (ie one that is likely to be the default currently used by open()) that can have byte values below 128 which do NOT mean what they would mean in ASCII? In other words, is it possible to read in a section of a file, think that it's ASCII, and then find that you decoded it wrongly?
I believe that IBM mainframes such as the Z series still use EBCDIC. Python for z/OS has EBCDIC/UTF interoperability as a selling point. I think that just means the codec module :-) https://www.ibm.com/products/open-enterprise-python-zos -- Steve