incorrect upper()/lower() of UTF-8
Martin v. Loewis
martin at v.loewis.de
Sat Jun 29 15:00:24 EDT 2002
"Fredrik Lundh" <fredrik at pythonware.com> writes:
> It's not supposed to: the 8-bit string type can hold either
> text strings or binary buffers. The encode method takes
> a text string and returns a binary buffer.
>
> For reliable processing of Unicode data, use Unicode strings.
It might be worth noting that the .upper and .lower methods of the
Unicode string are not locale-aware. Instead, they work on all
characters independent of the locale (*).
Whether this is a good or bad thing probably depends on the
application.
Regards,
Martin
(*) Actually, if Python uses wchar_t for the Unicode type, it also
uses the C library for .upper/.lower; I would consider that a bug.
More information about the Python-list
mailing list