[I18n-sig] Re: Pre-PEP: Proposed Python Character Model

Guido van Rossum guido@digicool.com
Tue, 20 Feb 2001 19:00:55 -0500


> Guido van Rossum wrote:
> > 
> > ...
> > 
> > This has been hashed to death many times before.  We have absolutely
> > no guarantee that the files from which Python strings are read are
> > encoded in Latin-1, but we do know pretty sure that they are an ASCII
> > superset (if they represent characters at all). Using the locale
> > module the user can (implicitly) indicate what the character set is,
> > and this may not be Latin-1.  Since s.islower() and other similar
> > functions are locale-sensitive, it would be inconsistent to declare
> > that 8-bit strings are always encoded in Latin-1. 
> 
> So the problem is that s.islower() might in some circumstances not equal
> unicode(s).islower()?
> 
> Is this really a bigger deal than the fact that in some circumstances
> comparisons between 8-bit strings and Unicode strings will cause an
> exception, depending on the contents of the 8-bit string. Or that sorts
> could throw exceptions? Or concatenations can fail?

Yes, it is a bigger deal, because it is a clear indication that
assuming Latin-1 is simply WRONG.

> The only arguments I have heard for the need for the builtin function
> "unichr" are based on the danger of concatenation failures in the
> 127-255 range. The price of this consistency is very high IMO!

--Guido van Rossum (home page: http://www.python.org/~guido/)