On Wed, Mar 11, 2020 at 07:28:06AM +1100, Chris Angelico wrote:
That's exactly what "ASCII compatible" means. Since ASCII is a seven-bit encoding, an encoding is ASCII-compatible if (a) every ASCII character is represented by the corresponding byte value, and (b) every seven-bit value represents that ASCII character.
Sorry Chris, that explanation left me more confused than I started :-( Let me have a go... The ASCII encoding is a mapping between *seven-bit numeric values* and 128 distinct characters, some of which are human-readable: A = 1000001 B = 1000010 a = 1100001 and some of which are considered to be "binary" characters: NUL = 0000000 SOH = 0000001 DEL = 1111111 In practice today, seven bits are inconvenient, so these are always padded with a leading 0 bit. An encoding is compatible with ASCII if, and only if, the following is true: * all 128 of the ASCII characters are handled by the encoding; * each of those characters are mapped to the same eight-bit value as the ASCII encoding would use (including the leading 0 bit); * no non-ASCII character is mapped to one of those eight-bit values; * or to something which could be confused with one of those eight-bit values by a naive application that processed them a byte at a time. E.g. if an encoding mapped some character ∇ to the 16-bit value: 01000001 11110000 that would not be considered ASCII-compatible, because the first byte would be interpreted as "A" by a naive application. Most (all?) of the "extended ASCII" eight-bit encodings are ASCII compatible, because they use only bytes with a leading 1 for the non-ASCII characters. UTF-8 is also ASCII compatible. UTF-16 and UTF-32 are *not* ASCII compatible. How did I go? -- Steven