incorrect upper()/lower() of UTF-8

Martin v. Loewis martin at v.loewis.de
Sat Jun 29 15:00:24 EDT 2002


"Fredrik Lundh" <fredrik at pythonware.com> writes:

> It's not supposed to: the 8-bit string type can hold either
> text strings or binary buffers.  The encode method takes
> a text string and returns a binary buffer.
> 
> For reliable processing of Unicode data, use Unicode strings.

It might be worth noting that the .upper and .lower methods of the
Unicode string are not locale-aware. Instead, they work on all
characters independent of the locale (*). 

Whether this is a good or bad thing probably depends on the
application.

Regards,
Martin

(*) Actually, if Python uses wchar_t for the Unicode type, it also
uses the C library for .upper/.lower; I would consider that a bug.



More information about the Python-list mailing list