[Python-ideas] Re: prefix/suffix for bytes (was: New explicit methods to trim strings)

11 Mar 2020

      On Wed, Mar 11, 2020 at 07:28:06AM +1100, Chris Angelico wrote:
...
That's exactly what "ASCII compatible" means. Since ASCII is a
seven-bit encoding, an encoding is ASCII-compatible if (a) every ASCII
character is represented by the corresponding byte value, and (b)
every seven-bit value represents that ASCII character.
Sorry Chris, that explanation left me more confused than I started :-(

Let me have a go...

The ASCII encoding is a mapping between *seven-bit numeric values* and 
128 distinct characters, some of which are human-readable:

    A = 1000001
    B = 1000010
    a = 1100001

and some of which are considered to be "binary" characters:

    NUL = 0000000
    SOH = 0000001
    DEL = 1111111

In practice today, seven bits are inconvenient, so these are always 
padded with a leading 0 bit.

An encoding is compatible with ASCII if, and only if, the following is 
true:

* all 128 of the ASCII characters are handled by the encoding;

* each of those characters are mapped to the same eight-bit value as
  the ASCII encoding would use (including the leading 0 bit);

* no non-ASCII character is mapped to one of those eight-bit values;

* or to something which could be confused with one of those eight-bit
  values by a naive application that processed them a byte at a time.

E.g. if an encoding mapped some character ∇ to the 16-bit value:

    01000001 11110000

that would not be considered ASCII-compatible, because the first byte 
would be interpreted as "A" by a naive application.

Most (all?) of the "extended ASCII" eight-bit encodings are ASCII 
compatible, because they use only bytes with a leading 1 for the 
non-ASCII characters.

UTF-8 is also ASCII compatible.

UTF-16 and UTF-32 are *not* ASCII compatible.

How did I go?

-- 
Steven

[Python-ideas] Re: prefix/suffix for bytes (was: New explicit methods to trim strings)

Steven D'Aprano