Share Code Tips

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Jul 20 05:18:07 CEST 2013


On Fri, 19 Jul 2013 18:08:43 -0400, Devyn Collier Johnson wrote:

> As for the case-insensitive if-statements, most code uses Latin letters.
> Making a case-insensitive-international if-statement would be
> interesting. I can tackle that later. For now, I only wanted to take
> care of Latin letters. I hope to figure something out for all
> characters.

As I showed, even for Latin letters, the trick of "if astring.lower() == 
bstring.lower()" doesn't *quite* work, although it can be "close enough" 
for some purposes. For example, some languages treat accents as mere 
guides to pronunciation, so ö == o, while other languages treat them as 
completely different letters. Same with ligatures: in modern English, æ 
should be treated as equal to ae, but in Old English, Danish, Norwegian 
and Icelandic it is a distinct letter.

Case-insensitive testing may be easier in many non-European languages, 
because they don't have cases.

A full solution to the problem of localized string matching requires 
expert knowledge for each language, but a 90% solution is pretty simple:

astring.casefold() == bstring.casefold()

or before version 3.3, just use lowercase. It's not a perfect solution, 
but it works reasonably well if you don't care about full localization.



-- 
Steven



More information about the Python-list mailing list