[I18n-sig] Re: [Python-Dev] unichr

M.-A. Lemburg mal@lemburg.com
Thu, 08 Feb 2001 11:01:38 +0100


Paul Prescod wrote:
> 
> Ka-Ping Yee wrote:
> >
> > ...
> >
> > At the moment, since the default encoding is ASCII, something like
> >
> >     u"abc" + chr(200)
> >
> > would cause an exception because 200 is outside of the ASCII range.
> 
> Yes, this is another mistake in Python's current handling of strings.
> there is absolutely nothing special about the 128-255 range of
> characters. We shouldn't start throwing exceptions until we get to 256.

You are forgetting that the range 128-255 is used by many codepages
to support language specific characters. chr(0xE0) will give different
characters in the US than e.g. in Russia. If we were to simply
let these conversions slip through, then people would find garbled
data in their text files.

Of course, if a user explicitly sets the default encoding to
Latin-1, then everything will be fine, but for ASCII (which is
the base of most character encodings in use today) there is
little other we can do except to raise an exception.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/