[I18n-sig] Re: Pre-PEP: Proposed Python Character Model
Paul Prescod
paulp@ActiveState.com
Tue, 20 Feb 2001 15:04:17 -0800
Guido van Rossum wrote:
>
> ...
>
> This has been hashed to death many times before. We have absolutely
> no guarantee that the files from which Python strings are read are
> encoded in Latin-1, but we do know pretty sure that they are an ASCII
> superset (if they represent characters at all). Using the locale
> module the user can (implicitly) indicate what the character set is,
> and this may not be Latin-1. Since s.islower() and other similar
> functions are locale-sensitive, it would be inconsistent to declare
> that 8-bit strings are always encoded in Latin-1.
So the problem is that s.islower() might in some circumstances not equal
unicode(s).islower()?
Is this really a bigger deal than the fact that in some circumstances
comparisons between 8-bit strings and Unicode strings will cause an
exception, depending on the contents of the 8-bit string. Or that sorts
could throw exceptions? Or concatenations can fail?
The only arguments I have heard for the need for the builtin function
"unichr" are based on the danger of concatenation failures in the
127-255 range. The price of this consistency is very high IMO!
--
Vote for Your Favorite Python & Perl Programming
Accomplishments in the first Active Awards!
http://www.ActiveState.com/Awards