[Tutor] ignoring diacritical signs

eryksun eryksun at gmail.com
Tue Dec 3 05:15:07 CET 2013


On Mon, Dec 2, 2013 at 3:08 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
>
> What is the difference between lower and casefold?
>
> casefold(...)
>     S.casefold() -> str
>
>     Return a version of S suitable for caseless comparisons.
>
>>>> "Alala alala".casefold() == "Alala alala".lower()
> True

In 3.3, Unicode case conversion is extended to handle mapping to
multiple characters and case folding:

    >>> u'ß'.lower()
    'ß'

    >>> u'ß'.casefold()
    'ss'

http://docs.python.org/3/library/stdtypes.html#str.casefold

In 3.x, bytes and bytearray case conversions use lookup tables, for
ASCII only. This also applies to the bytearray type in 2.6/2.7. On the
other hand, 2.x str case conversions are locale aware:

Default C/POSIX locale:

    >>> print '\xc4'.decode('latin-1')
    Ä

    >>> print '\xc4'.lower().decode('latin-1')
    Ä

German/Germany locale with Latin-1 codeset:

    >>> locale.setlocale(locale.LC_ALL, 'de_DE.iso-8859-1')
    'de_DE.iso-8859-1'

    >>> print '\xc4'.lower().decode('latin-1')
    ä


More information about the Tutor mailing list