Mailman 3 adding a casefold() method to str - Python-ideas

adding a casefold() method to str

Benjamin Peterson

Jan. 8, 2012

4:17 p.m.

Hi, Casefolding (Unicode Standard 3.13) is a more aggressive version of lowercasing. It's purpose to assist in the implementation of caseless mapping. For example, under lowercase "ß" -> "ß" but under casefolding "ß" -> "ss". I propose we add a casefold() method. So, case-insensitive matching should really be "one.casefold() == two.casefold()" rather than "one.lower() == two.lower()". Regards, Benjamin

Show replies by date

Steven D'Aprano

January 2012

4:58 p.m.

Benjamin Peterson wrote:

...

+1 in principle, but in practice case folding is more complicated than a single method might imply. The most obvious complication is treatment of dotted and dotless I. See, for example: http://unicode.org/Public/UNIDATA/CaseFolding.txt http://www.w3.org/International/wiki/Case_folding http://en.wikipedia.org/wiki/Letter_case#Unicode_case_folding_and_script_ide... So while having proper Unicode case-folding is desirable, I don't know how simple it is to implement. Would it be appropriate for casefold() to take an optional argument as to which mappings to use? E.g. something like: str.casefold() # defaults to simple folding str.casefold(string.SIMPLE & string.TURKIC) str.casefold(string.FULL) or should str.casefold() only apply simple folding, with the others combinations relegated to a function in a module somewhere? I count 4 possible functions: simple casefolding, without Turkic I full casefolding, without Turkic I simple casefolding, with Turkic I full casefolding, with Turkic I -- Steven

Benjamin Peterson

5:47 p.m.

Steven D'Aprano <steve@...> writes:

...

or should str.casefold() only apply simple folding, with the others combinations relegated to a function in a module somewhere?

Yes, I think so. str does not have any other features dependent on locale. Section 3.3 defines "Default casefolding" which is what the casefold() method should use.

Steven D'Aprano

January 2012

4:58 p.m.

Benjamin Peterson wrote:

...

Benjamin Peterson

5:47 p.m.

Steven D'Aprano <steve@...> writes:

...

or should str.casefold() only apply simple folding, with the others combinations relegated to a function in a module somewhere?

Yes, I think so. str does not have any other features dependent on locale. Section 3.3 defines "Default casefolding" which is what the casefold() method should use.

4819

Age (days ago)

4819

Last active (days ago)

List overview

Download

2 comments

2 participants

participants (2)

Benjamin Peterson
Steven D'Aprano

adding a casefold() method to str

Benjamin Peterson

Steven D'Aprano

Benjamin Peterson

Steven D'Aprano

Benjamin Peterson

tags

participants (2)