[Tutor] ignoring diacritical signs
eryksun
eryksun at gmail.com
Tue Dec 3 05:15:07 CET 2013
On Mon, Dec 2, 2013 at 3:08 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
>
> What is the difference between lower and casefold?
>
> casefold(...)
> S.casefold() -> str
>
> Return a version of S suitable for caseless comparisons.
>
>>>> "Alala alala".casefold() == "Alala alala".lower()
> True
In 3.3, Unicode case conversion is extended to handle mapping to
multiple characters and case folding:
>>> u'ß'.lower()
'ß'
>>> u'ß'.casefold()
'ss'
http://docs.python.org/3/library/stdtypes.html#str.casefold
In 3.x, bytes and bytearray case conversions use lookup tables, for
ASCII only. This also applies to the bytearray type in 2.6/2.7. On the
other hand, 2.x str case conversions are locale aware:
Default C/POSIX locale:
>>> print '\xc4'.decode('latin-1')
Ä
>>> print '\xc4'.lower().decode('latin-1')
Ä
German/Germany locale with Latin-1 codeset:
>>> locale.setlocale(locale.LC_ALL, 'de_DE.iso-8859-1')
'de_DE.iso-8859-1'
>>> print '\xc4'.lower().decode('latin-1')
ä
More information about the Tutor
mailing list