Python Unicode handling wins again -- mostly
Roy Smith
roy at panix.com
Fri Nov 29 21:08:49 EST 2013
In article <529934dc$0$29993$c3e8da3$5496439d at news.astraweb.com>,
Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> (8) What's the uppercase of "baffle" spelled with an ffl ligature?
>
> Like most other languages, Python 3.2 fails:
>
> py> 'baffle'.upper()
> 'BAfflE'
>
> but Python 3.3 passes:
>
> py> 'baffle'.upper()
> 'BAFFLE'
I disagree.
The whole idea of ligatures like fi is purely typographic. The crossbar
on the "f" (at least in some fonts) runs into the dot on the "i".
Likewise, the top curl on an "f" run into the serif on top of the "l"
(and similarly for ffl).
There is no such thing as a "FFL" ligature, because the upper case
letterforms don't run into each other like the lower case ones do.
Thus, I would argue that it's wrong to say that calling upper() on an
ffl ligature should yield FFL.
I would certainly expect, x.lower() == x.upper().lower(), to be True for
all values of x over the set of valid unicode codepoints. Having
u"\uFB04".upper() ==> "FFL" breaks that. I would also expect len(x) ==
len(x.upper()) to be True.
More information about the Python-list
mailing list