[I18n-sig] Re: Pre-PEP: Proposed Python Character Model

Paul Prescod paulp@ActiveState.com
Tue, 20 Feb 2001 15:04:17 -0800


Guido van Rossum wrote:
> 
> ...
> 
> This has been hashed to death many times before.  We have absolutely
> no guarantee that the files from which Python strings are read are
> encoded in Latin-1, but we do know pretty sure that they are an ASCII
> superset (if they represent characters at all). Using the locale
> module the user can (implicitly) indicate what the character set is,
> and this may not be Latin-1.  Since s.islower() and other similar
> functions are locale-sensitive, it would be inconsistent to declare
> that 8-bit strings are always encoded in Latin-1. 

So the problem is that s.islower() might in some circumstances not equal
unicode(s).islower()?

Is this really a bigger deal than the fact that in some circumstances
comparisons between 8-bit strings and Unicode strings will cause an
exception, depending on the contents of the 8-bit string. Or that sorts
could throw exceptions? Or concatenations can fail?

The only arguments I have heard for the need for the builtin function
"unichr" are based on the danger of concatenation failures in the
127-255 range. The price of this consistency is very high IMO!

-- 
Vote for Your Favorite Python & Perl Programming  
Accomplishments in the first Active Awards! 
http://www.ActiveState.com/Awards