[I18n-sig] Re: Pre-PEP: Proposed Python Character Model

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 20 Feb 2001 23:11:22 +0100


> A better question is why if the first half of group 0, plane 0,
> row 0 better than the last half?

Well, because it is ASCII, and because ASCII is a subset of most
encodings - so assuming that an octet string is meant as ASCII when
compared to a Unicode object has a high probability of being a good
guess. The same is not true if there are octets >128.

> 
> >>> unichr(160)==chr(160)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: ASCII decoding error: ordinal not in range(128)
> 
> The Unicode guys made group 0, plane 0, row 0 Latin-1 for a reason.

Sure: to allow easy conversion between Latin-1 documents and Unicode.

> It's not just an accident. I don't think it makes sense for us to
> agree with them "halfway"...

We agree with them all the way. The codec that deals with Latin-1 is
hard-coded in _codecs, whereas the other single-byte encodings require
dictionaries for operation.

Regards,
Martin