[Python-ideas] adding a casefold() method to str
Steven D'Aprano
steve at pearwood.info
Sun Jan 8 17:58:17 CET 2012
Benjamin Peterson wrote:
> Hi,
> Casefolding (Unicode Standard 3.13) is a more aggressive version of lowercasing.
> It's purpose to assist in the implementation of caseless mapping. For example,
> under lowercase "ß" -> "ß" but under casefolding "ß" -> "ss". I propose we add a
> casefold() method. So, case-insensitive matching should really be
> "one.casefold() == two.casefold()"
> rather than "one.lower() == two.lower()".
+1 in principle, but in practice case folding is more complicated than a
single method might imply. The most obvious complication is treatment of
dotted and dotless I.
See, for example:
http://unicode.org/Public/UNIDATA/CaseFolding.txt
http://www.w3.org/International/wiki/Case_folding
http://en.wikipedia.org/wiki/Letter_case#Unicode_case_folding_and_script_identification
So while having proper Unicode case-folding is desirable, I don't know how
simple it is to implement.
Would it be appropriate for casefold() to take an optional argument as to
which mappings to use? E.g. something like:
str.casefold() # defaults to simple folding
str.casefold(string.SIMPLE & string.TURKIC)
str.casefold(string.FULL)
or should str.casefold() only apply simple folding, with the others
combinations relegated to a function in a module somewhere?
I count 4 possible functions:
simple casefolding, without Turkic I
full casefolding, without Turkic I
simple casefolding, with Turkic I
full casefolding, with Turkic I
--
Steven
More information about the Python-ideas
mailing list