Python Unicode handling wins again -- mostly

Dave Angel davea at davea.name
Fri Nov 29 22:06:21 EST 2013


On Fri, 29 Nov 2013 21:28:47 -0500, Roy Smith <roy at panix.com> wrote:
> In article <mailman.3417.1385777557.18130.python-list at python.org>,
>  Chris Angelico <rosuav at gmail.com> wrote:
> > On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith <roy at panix.com> wrote:
> > > I would certainly expect, x.lower() == x.upper().lower(), to be 
True for
> > > all values of x over the set of valid unicode codepoints.  
Having
> > > u"\uFB04".upper() ==> "FFL" breaks that.  I would also expect 
len(x) ==
> > > len(x.upper()) to be True.

> > That's a nice theory, but the Unicode consortium disagrees with 
you on
> > both points.

And they were already false long before Unicode.  I don’t know 
specifics but there are many cases where there are no uppercase 
equivalents for a particular lowercase character.  And others where 
the uppercase equivalent takes multiple characters.

-- 
DaveA




More information about the Python-list mailing list