On 1/24/21 6:00 AM, Chris Angelico wrote:
Sorry, let me clarify.
Can anyone give an example of a current system encoding (ie one that is likely to be the default currently used by open()) that can have byte values below 128 which do NOT mean what they would mean in ASCII? In other words, is it possible to read in a section of a file, think that it's ASCII, and then find that you decoded it wrongly?
EBCDIC is one big option. There are also some National Character sets which change a couple of the lower 128 characters for use with characters that language needed. (This was the cause of adding Trigraphs to C, to provide a way enter those characters on systems that didn't have those characters.
One common example was a Japanese character set that replaced \ with the Yen sign (and a few others) and then used some above 128 codes for multi-byte sequences. Users of such systems just got used to use the Yen sign as the path separator.
The EBCDIC cases would likely be well know on those systems, and planned for. Having a system with a few of the lower 128 being substituted for could be a bigger surprise for a system.