incorrect upper()/lower() of UTF-8

Martin v. Loewis martin at
Sat Jun 29 15:00:24 EDT 2002

"Fredrik Lundh" <fredrik at> writes:

> It's not supposed to: the 8-bit string type can hold either
> text strings or binary buffers.  The encode method takes
> a text string and returns a binary buffer.
> For reliable processing of Unicode data, use Unicode strings.

It might be worth noting that the .upper and .lower methods of the
Unicode string are not locale-aware. Instead, they work on all
characters independent of the locale (*). 

Whether this is a good or bad thing probably depends on the


(*) Actually, if Python uses wchar_t for the Unicode type, it also
uses the C library for .upper/.lower; I would consider that a bug.

More information about the Python-list mailing list