[I18n-sig] Re: Pre-PEP: Proposed Python Character Model

Paul Prescod paulp@ActiveState.com
Tue, 20 Feb 2001 13:46:35 -0800


"Martin v. Loewis" wrote:
> 
> > ...
> >
> > It's not an encoding.  It's the subset of Unicode that you can store
> > in an 8-bit character.
> 
> No, it is not *the* subset of Unicode that you can store in an 8-bit
> character. You can store any subset of Unicode with a cardinality <256
> in a single octet.
> 
> Latin-1 is group 0, plane 0, row 0. Why is it any better than any
> other plane or row?

I don't know. You tell me.

>>> "a"==u"a"==chr(97)
1

It looks like we've already decided that group 0, plane 0, row 0 is
special. A better question is why if the first half of group 0, plane 0,
row 0 better than the last half?

>>> unichr(160)==chr(160)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII decoding error: ordinal not in range(128)

The Unicode guys made group 0, plane 0, row 0 Latin-1 for a reason. It's
not just an accident. I don't think it makes sense for us to agree with
them "halfway"...especially when this half-way agreement causes all
kinds of nasty problems like forcing Python to raise exceptions in
places that are really surprising like equality tests and sort
functions.

-- 
Vote for Your Favorite Python & Perl Programming  
Accomplishments in the first Active Awards! 
http://www.ActiveState.com/Awards